Synchronization protocol

ABSTRACT

A monolithic integrated circuit for use in a system having a host processor, the integrated circuit including at least one input pin for receiving command data from the host processor; at least one output pin for transmitting processed image data to the host processor in response to the command data; and an image processor configured to generate the processed image data by performing an image-processing function on image data captured by an image sensor.

CO-PENDING APPLICATIONS

Various methods, systems and apparatus relating to the present inventionare disclosed in the following co-pending applications filed by theapplicant or assignee of the present invention simultaneously with thepresent application:

NPS047, NPS048, NPS049, NPS050, NPS051, NPS052, NPS053.

The disclosures of these co-pending applications are incorporated hereinby cross-reference. Each application is temporarily identified by itsdocket number. This will be replaced by the corresponding USSN whenavailable.

CROSS-REFERENCES

Various methods, systems and apparatus relating to the present inventionare disclosed in the following co-pending applications filed by theapplicant or assignee of the present invention. The disclosures of allof these co-pending applications are incorporated herein bycross-reference: 10/409,876 10/409,848 10/409,845 09/575,197 09/575,19509/575,159 09/575,132 09/575,123 09/575,148 09/575,130 09/575,16509/575,153 09/693,415 09/575,118 09/609,139 09/608,970 09/575,11609/575,144 09/575,139 09/575,186 09/575,185 09/609,039 09/663,57909/663,599 09/607,852 09/575,191 09/693,219 09/575,145 09/607,65609/693,280 09/609/132 09/693,515 09/663,701 09/575,192 09/663,64009/609,303 09/610,095 09/609,596 09/693,705 09/693,647 09/721,89509/721,894 09/607,843 09/693,690 09/607,605 09/608,178 09/609,55309/609,233 09/609,149 09/608,022 09/575,181 09/722,174 09/721,89610/291,522 10/291,517 10/291,523 10/291,471 10/291,470 10/291,81910/291,481 10/291,509 10/291,825 10/291,519 10/291,575 10/291,55710/291,661 10/291,558 10/291,587 10/291,818 10/291,576 10/291,58910/291,526 6,644,545 6,609,653 6,651,879 10/291,555 10/291,51019/291,592 10/291,542 10/291,820 10/291,516 10/291,363 10/291,48710/291,520 10/291,521 10/291,556 10/291,821 10/291,525 10/291,58610/291,822 10/291,524 10/291,553 10/291,511 10/291,585 10/291,374NPA125US 10/685,583 NPA127US 10/685,584 NPA133US 09/575,193 09/575,15609/609,232 09/607,844 09/607,657 09/693,593 NPB008US 09/928,05509/927,684 09/928,108 09/927,685 09/927,809 09/575,183 09/575,16009/575,150 09/575,169 6,644,642 6,502,614 6,622,999 09/575,14910/322,450 6,549,935 NPN004US 09/575,187 09/575,155 6,591,884 6,439,70609/575,196 09/575,198 09/722,148 09/722,146 09/721,861 6,290,3496,428,155 09/575,146 09/608,920 09/721,892 09/722,171 09/721,85809/722,142 10/171,987 10/202,021 10/291,724 10/291,512 10/291,55410/659,027 10/659,026 09/693,301 09/575,174 09/575,163 09/693,21609/693,341 09/693,473 09/722,087 09/722,141 09/722,175 09/722,14709/575,168 09/722,172 09/693,514 09/721,893 09/722,088 10/291,57810/291,823 10/291,560 10/291,366 10/291,503 10/291,469 10/274,81709/575,154 09/575,129 09/575,124 09/575,188 09/721,862 10/120,44110/291,577 10/291,718 10/291,719 10/291,543 10/291,494 10/292,60810/291,715 10/291,559 10/291,660 10/409,864 10/309,358 10/410,484NPW008US NPW009US 09/575,189 09/575,162 09/575,172 09/575,170 09/575,17109/575,161 10/291,716 10/291,547 10/291,538 10/291,717 10/291,82710/291,548 10/291,714 10/291,544 10/291,541 10/291,584 10/291,57910/291,824 10/291,713 10/291,545 10/291,546 09/693,388 09/693,70409/693,510 09/693,336 09/693,335 10/181,496 10/274,199 10/309,18510/309,066

Some application has been listed by docket numbers, these will bereplaced when application number are known.

FIELD OF THE INVENTION

The present invention relates to the field of monolithic integratedcircuits, and, more particularly, to image capture and image processing.

The invention has been developed for use in a hand-held stylusconfigured to capture coded data disposed on a substrate, and will bedescribed hereinafter with reference to that application. However, itwill be appreciated that the invention can be applied to other devices.

GLOSSARY

This section lists the acronyms, abbreviations and similar informationused in this specification.

-   BIST: Built-in self test-   DNL: Differential non-linearity-   ESD: Electro-static discharge-   FPN: Fixed pattern noise-   INL: Integral non-linearity-   PGA: Programmable gain amplifier-   PVT: Process-Voltage-Temperature

BACKGROUND OF INVENTION

Monolithic integrated circuit image sensors are known in the art.Examples include Charge-Coupled Devices (CCDs) and CMOS image sensors.Refer, for example, to Janesick, J. R., Scientific Charge-CoupledDevices (SPIE Press 2001); Holst, G. C., CCD Arrays, Cameras andDisplays (SPIE Press 1996); and Moini, A., Vision Chips (Kluwer AcademicPublishers 1999). Digital image processing algorithms are known in theart. Refer, for example, to Gonzales, R. C. and R. E. Woods, DigitalImage Processing (Addision Wesley 1992).

Image sensors such as CMOS and CCD image capture devices are known. Suchdevices are typically designed to work in conjunction with an externalframestore and a host processor.

One of the issues that arises when such image sensors are used insystems with a host processor is that the link between the image sensorand the host processor must support the relatively high read-out datarate of the image sensor.

It is an object of the invention to provide alternative architecturesthat overcome some of the problems associated with direct couplingbetween the image sensor and the host processor.

Active pixel cells have a storage node which stores a charge. During anintegration period, the stored charge is modified from an initial level.Once the integration is completed, the amount of charge determines anoutput voltage, which can be used to drive an output circuit. The outputof the output circuit is controlled by the voltage, and hence thecharge, of the storage node.

In conventional pixel cells, switching into and out of the integrationperiod causes one or more voltage drops at the storage node due tovarious capacitances in the circuit. This reduces the potential dynamicrange of the pixel cell.

It would be desirable to provide a pixel cell that overcomes or at leastreduces the impact of these voltage drops without requiring complicatedadditional circuitry. It would be even more desirable if a fill factorof such a pixel cell was not substantially different to that of priorart pixel cells.

SUMMARY OF THE INVENTION

In a first aspect the present invention provides a monolithic integratedcircuit including an image sensor for capturing image information; atleast one analog to digital converter for converting analog signalscorresponding to the image information into digital image data; and afirst framestore for storing frames of the digital image data.

In a second aspect the present invention provides a monolithicintegrated circuit including an image sensor for capturing imageinformation; at least one analog to digital convertor for convertinganalog signals corresponding to the image information into digital imagedata; and an image processor, the image processor including a low-passfilter for filtering the image data, thereby to generate filtered imagedata.

In a further aspect the present invention provides a monolithicintegrated circuit including an image processor, the image processorincluding a low-pass filter for filtering digital image data receivedfrom an image sensor, thereby to generate filtered image data; and asubsampler for subsampling the filtered image data, thereby to generatesubsampled image data; and a subsampled framestore, the monolithicintegrated circuit being configured to store the subsampled image datain the subsampled frame store.

In a third aspect the present invention provides a monolithic integratedcircuit comprising an image sensor for sensing image information; atleast one analog to digital convertor for converting analog signalscorresponding to the image information into digital image data; and animage processor, the image processor including a range expansion circuitfor range expanding the digital image data.

In a fourth aspect the present invention provides a photodetectingcircuit comprising a photodetector for generating a signal in responseto incident light; a storage node having first and second nodeterminals, the first node terminal being connected to the photodetectorto receive the signal such that charge stored in the node changes duringan integration period of the photodetecting circuit; and an outputcircuit for generating an output signal during a read period of thephotodetecting circuit, the output signal being at least partially basedon a voltage at the first terminal; the photodetecting circuit beingconfigured to eceive a reset signal; integrate charge in the storagenode during an integration period following receipt of the reset signal;and receive a compensation signal at the second terminal of the storagenode at least during the read period, the compensation signal increasingthe voltage at the first terminal whilst the output circuit generatesthe output signal.

In a further aspect the present invention provides a method of sensing apixel value comprising the steps of, in a photodetector circuit:

-   -   resetting the circuit;    -   generating a photocurrent in a photodetector in response to        light falling on the photodetector;    -   modifying a charge in a storage node over an integration period        in accordance with the photocurrent;    -   at the end of the integration period, reading the charge in the        storage node to determine the pixel value, the step of reading        including the substep of applying a compensatory voltage to a        terminal to at least partially compensate for one or more        voltage drops associated with the commencement and/or        termination of the integration period.

In a fifth aspect the present invention provides a monolithic imagesensing device, including an image sensor for sensing image data; timingcircuitry for generating at least one internal timing signal, the imagesensor being responsive to at least one of the internal timing signalsto at least commence sensing of the image data; and at least oneexternal timing signal; at least one external pin for supplying the atleast one external timing signal to at least one peripheral device.

In a sixth aspect the present invention provides a monolithic integratedcircuit comprising an image processor configured to make each of aseries of frames of image data available to a host processor, the imageprocessor being configured to receive a first message from the hostprocessor indicative of the host processor not requiring further accessto the image data prior to a subsequent frame synchronisation signal; inresponse to the first message, causing at least part of the integratedcircuit to enter a low power mode; and in response to a framesynchronisation signal, cause the part of the integrated circuit in thelow power mode to exit the low power mode.

In a seventh aspect the present invention provides a monolithic imagesensing device including an image processor, the integrated circuitbeing configured to operate in a system having a host processor, theimage processor being configured to receive, from the host processor, arequest for access to a next available frame of image data from aframestore; in the event the frame of image data is available, sending amessage to the host processor indicative of the image data'savailability; and in the event the frame of image data is not available,waiting until it is available and then sending a message to the hostprocessor indicative of the image data's availability.

In a further aspect the present invention provides A monolithicintegrated circuit including an image processor, the integrated circuitbeing configured to operate in a system having a host processor and aframestore, the image processor being configured to receive a messagefrom the host processor confirming that image data in the framestore isno longer required; and in the event that new image data is received tobe stored in the framestore prior to the message being received,discarding the new image data.

In an eighth aspect the present invention provides A monolithicintegrated circuit for use in a system having a host processor, theintegrated circuit including at least one input pin for receivingcommand data from the host processor; at least one output pin fortransmitting processed image data to the host processor in response tothe command data; and an image processor configured to generate theprocessed image data by performing an image-processing function on imagedata captured by an image sensor.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1. Jupiter system diagram

FIG. 2. Detailed architecture of Jupiter

FIG. 3. Timing diagram of the image sensor event signals in Freeze-Framemode

FIG. 4. Tming diagram of image sensor data interface

FIG. 5. Timing diagram of the ADC during a conversion cycle

FIG. 6. Timing diagram of the ADC during a calibration cycle

FIG. 7. Timing diagram of the clock multiplier

FIG. 8 a. First embodiment of a shuttered pixel circuits

FIG. 8 b. Second embodiment of a shuttered pixel circuits

FIG. 9. Typical timing diagram of a shuttered pixel during theintegration cycle

FIG. 10. The new pixel design to compensate for reset voltage drop

FIG. 11. Schematic diagram of the column circuit

FIG. 12. Timing diagram during integration cycle

FIG. 13. The timing diagram of the read-out cycle

FIG. 14. Schematic diagram of the row decoder circuit

FIG. 15. Schematic diagram of level shifter

FIG. 16. Bias generator circuit

FIG. 17. Layout of the 10 um pixel using a photodiode and the capacitor

FIG. 18. Layout of the 10 um pixel using a photodiode and without thecapacitor

FIG. 19. Layout of the 10 um pixel using a BJT

FIG. 20. Block diagram of the sensor

FIG. 21. The structure of a pipelined ADC

FIG. 22. A bit-slice of the switched capacitor based ADC

FIG. 23. The structure of three bit slices of the ADC in one phase ofthe clock

FIG. 24. The structure of the differential folded cascode circuit usedin the ADC

FIG. 25. The bias generator circuit for the PGA and ADC

FIG. 26. The common mode feedback circuit

FIG. 27. The gain booting amplifiers

FIG. 28. The clock generator

FIG. 29. The reference current generator

FIG. 30. Resistive ladder used in the bias current generator

FIG. 31. The schematic diagram of the comparator

FIG. 32. Common mode and reference voltage generator

FIG. 33. The wide-range OTA used in the reference voltage generators

FIG. 34. The structure of the bandgap generator

FIG. 35. The multistage opamp used in the bandgap generator

FIG. 36. The structure of the PGA

FIG. 37. The selectable capacitor structure used in the PGA

FIG. 38. The compensation structure used in the PGA opamp

FIG. 39. The floorplan of the ADC

FIG. 40. The block diagram of the ADC

FIG. 41. Timing diagram of the ADC in the normal mode

FIG. 42. Callisto system diagram

FIG. 43. Coordinate system

FIG. 44. Sub-sampling

FIG. 45. Sub-sampling pixel replication

FIG. 46. Dynamic range expansion window

FIG. 47. Incomplete dynamic range expansion window

FIG. 48. Sub-pixel value

FIG. 49. General Callisto message format

FIG. 50. Register access message format

FIG. 51. Callisto command message format

FIG. 52. Register data message format

FIG. 53. Command data message format

FIG. 54. Command data format for processed image read command

FIG. 55. Frame sync message format

FIG. 56. Frame store write message format

FIG. 57. Frame store write message format

FIG. 58. Unprocessed image read command message

FIG. 59 a. Processed image read command with arguments

FIG. 59 b. Processed image read command without arguments

FIG. 60 a. Sub-sampled image read command with arguments

FIG. 60 b. Sub-sampled image read command without arguments

FIG. 61. Sub-pixel read command message

FIG. 62. Command execution and frame store write states

FIG. 63. Frame store buffer locking

FIG. 64. Error recovery cycle

FIG. 65. Reset timing

FIG. 66. Image sensor data interface timing

FIG. 67. Image sensor timing signals

FIG. 68. Image sensor timing—external capture

FIG. 69. Serial interface synchronous timing: 2 bytes back-to-back fromCallisto to microprocessor

FIG. 70. Serial interface synchronous timing single bite transfer frommicroprocessor to Callisto

FIG. 71. Error recovery timing using break

FIG. 72. External register interface read timing

FIG. 73. External register interface write timing

FIG. 74. Callisto top-level partitioning

FIG. 75. clk_driver logic

FIG. 76. register_read State Machine

FIG. 76 a. Four-byte Register Read Access

FIG. 77. serialif structure

FIG. 78. ser2par State Machine

FIG. 79. msg_sync State Machine

FIG. 80. msg_hand State Machine

FIG. 81. Register Write and Read Accesses

FIG. 82. Unprocessed-Processed-Subsampled Image Read Sequence

FIG. 83. Subpixel Read Command

FIG. 84. Direct Frame Store Write Sequence

FIG. 85. frame_handshaking State Machine

FIG. 86. header_generation State Machine

FIG. 87. sif_par2ser functional timing

FIG. 88. par2ser State Machine

FIG. 89. error_handler State Machine

FIG. 90. imgproc structure

FIG. 91. imgproc_fs State Machine

FIG. 92. Sub-functions of the Processed Image Read Function

FIG. 93. “Column Min-max” Generation

FIG. 94. “Column Min-Max” Pipeline and Range-Expand and Threshold

FIG. 95. Serial Output during Processed Image Region Read

FIG. 96. imgproc_sertim state machine

FIG. 97. imgsensif structure

FIG. 98. sens_ctrl state machine (fsm—double buffered)

FIG. 99. sens_ctrl state machine(onebuf—single buffered)

FIG. 100. synchronizer design

FIG. 101. reset_sync design

FIG. 102. sig_pulse_sync design

FIG. 103. New Fram events—Double buffering

FIG. 104. Single Buffer—Basic cadence

FIG. 105. Single Buffer—Normal operation

FIG. 106. Single Buffer—One missed frame

FIG. 107. Double Buffering—Same cadence as normal operation for singlebuffer

FIG. 108. Double Buffering—No missed frames, simultaneous read and write

FIG. 109. Double Buffering—One missed frame

FIG. 110. Generalized RAM Accesses

FIG. 111. Sub-sample Buffer RAM architecture

FIG. 112. Scan Test Operation

FIG. 113. Symmetric FIR parallel implementation

FIG. 114. Reuse of multiplier and adder tree

FIG. 115. 2-tap 2D FIR

FIG. 116. Symmetric 2D FIR's

FIG. 117. Block memory scheme decoupling decimation factors and filterorder

FIG. 118. Reduced linestore 2D FIR

FIG. 119. Tag image processing chain

FIG. 120. First sample tag structure, showing symbol arrangement

FIG. 121. First sample tag structure, showing macrodot arrangement,(fully populated with macrodots)

FIG. 122. Second sample tag structure, showing symbol arrangement

FIG. 123. Second sample tag structure, showing macrodot arrangement(fully populated with macrodots)

DETAILED DESCRIPTION OF PREFERRED AND OTHER EMBODIMENTS

The detailed description is broken down into sections for convenience.

Section A describes a preferred embodiment of the present invention inthe form of the Jupiter image sensor chip with on-board imageprocessing.

Section B describes the functions of the Ganymede image sensor componentof Jupiter.

Section C describes the design of the Ganymede image sensor.

Section D describes the design of an 8-bit analog-to-digital converter(ADC) used by Ganymede.

Section E describes the functions and design of the Callisto imageprocessor component of Jupiter.

Section F describes alternative filtering and subsampling circuits whichmay be utilised by Callisto.

Section G describes netpage tag sensing algorithms adapted to utilisethe Callisto image processor for tag image processing and tag decodingin the context of the netpage networked computer system outlined in thecross-referenced patent applications listed above.

In a preferred embodiment of the invention, the Jupiter image sensor isdesigned to be embedded in a netpage sensing device such as a netpagepen (as described in co-pending PCT application WO 00/72230 entitled“Sensing Device, filed 24 May 2000; and co-pending U.S. application Ser.No. 09/721,893 entitled “Sensing Device”, filed 25 Nov. 2000), or aNetpage viewer (as described in co-pending PCT application WO 01/41046entitled “Viewer with Code Sensor”, filed 27 Nov. 2000).

In a preferred embodiment of the invention, the Jupiter image sensor isalso designed to be used in conjunction with surfaces tagged withidentity-coding and/or position-coding patterns (such as described inco-pending PCT application WO 00/72249 entitled “Identity-Coded Surfacewith Reference Points”, filed 24 May 2000; co-pending PCT application WO02/84473 entitled “Cyclic Position Codes”, filed 11 Oct. 2001;co-pending U.S. application Ser. No. 10/309358 entitled “RotationallySymmetric Tags”, (docket number NPT020US) filed 4 Dec. 2002; andAustralian Provisional Application 2002952259 entitled “Methods andApparatus (NPT019)”, filed 25 Oct. 2002).

Various alternative pixel designs suitable for incorporation in theJupiter image sensor are described in co-pending PCT applicationPCT/AU/02/01573 entitled “Active Pixel Sensor”, filed 22 Nov. 2002; andco-pending PCT application PCT/AU02/01572 entitled “Sensing Device withAmbient Light Minimisation”, filed 22 Nov. 2002.

The preferred form of the invention is a monolithic image sensor, analogto digital converter (ADC), image processor and interface, which areconfigured to operate within a system including a host processor. Theapplicants have codenamed the monolithic integrated circuit “Jupiter”.The image sensor and ADC are codenamed “Ganymede” and the imageprocessor and interface are codenamed “Callisto”.

It should appreciated that the aggregation of particular components intofunctional or codenamed blocks is not necessarily an indication thatsuch physical or even logical aggregation in hardware is necessary forthe functioning of the present invention. Rather, the grouping ofparticular units into functional blocks is a matter of designconvenience in the particular preferred embodiment that is described.The intended scope of the present invention embodied in the detaileddescription should be read as broadly as a reasonable interpretation ofthe appended claims allows.

Jupiter

Function and Environment

The Jupiter image sensor has been designed for high-speed low-costmachine vision applications, such as code sensing in devices such as theNetpage pen and Netpage viewer. Jupiter comprises an image sensor array,ADC function, timing and control logic, digital interface to an externalmicrocontroler, and implementation of some of the computational steps ofmachine vision algorithms.

FIG. 1 shows a system-level diagram of the Jupiter monolithic integratedcircuit 1 and its relationship with a host processor 2. Jupiter 1 hastwo main functional blocks: Ganymede 4 and Callisto 6 blocks. Ganymedecomprises the sensor array, ADC, timing and control logic, clockmultiplier PLL, and bias. Callisto comprises the image processing, imagebuffer memory, and serial interface to a host processor. A parallelinterface 8 links Ganymede 4 with Callisto 6, and a serial interface 10links Callisto 6 with the host processor 2

Interfaces

Jupiter has several internal and external interfaces. External interfaceinclude the host processor interface and a flash (exposure) and captureinterface. Both of these interfaces belong to Callisto and are describedin more detail in the Callisto section below.

The internal interfaces in Jupiter are used for communication among thedifferent internal modules. The internal interfaces in Jupiter aredescribed in more detail below.

Power Modes

Each module in Jupiter has two power modes: SLEEP and ON. In the SLEEPmode, the modules are shut down, and in the ON mode the modules areactivated for normal operation. The power is controlled via an internal8-bit register. Each bit of this register is used to control oneseparate module. A bit value of 0 means that the associated module isturned off while a bit value of 1 means that the associated module isturned on.

Mechanical Characteristics

The packaging of Jupiter is performed using a wafer-level packagingtechnique to reduce the overall manufacturing cost. The physicalplacement of the pads and their dimensions, and the wafer-level diespecifications, accommodate the wafer-level packaging process.

Ganymede Image Sensor

Ganymede features:

sensor array

8-bit digitisation of the sensor array output

digital image output to Callisto.

a clock multiplying PLL.

Ganymede Functional Characteristics

As best shown in FIG. 4, Ganymede 4 comprises a sensor array 12, an ADCblock 14, a control and timing block 16 and a phase lock loop (PLL) 18for providing an internal clock signal. The sensor array comprisespixels 20, a row decoder 22, a column decoder and MUX 24. The ADC block14 includes an ADC 26 and a programmable gain amplifier (PGA) 28. Thecontrol and timing block 16 controls the sensor array 12, the ADC 26,and the PLL 18, and provides an interface to Callisto 6.

The following table shows characteristics of the sensor array 12:Parameter Characteristic Comment Resolution 8 bits Sampling frequency —For an N × N sensor array the sampling frequency is greater than0.002/(N × N) Hz. Integral <1 bit non-linearity (INL) Differential <0.5bit non-linearity (DNL) Input voltage range +/−1.0 Differential inputGain 1 The gain of the ADC is to 16 linearly set by a 4-bit register.Offset <0.5 bit A calibration mechanism is implemented to reduce theoffset. Missing codes NONEADC

The ADC block is used to digitise the analog output of the sensor array.The following table shows characteristics of the ADC: ParameterCharacteristic Comment Resolution 8 bits Sampling frequency — For an N ×N sensor array the sampling frequency is greater than 0.002/(N × N) Hz.Integral <1 bit non-linearity (INL) Differential <0.5 bit non-linearity(DNL) Input voltage range +/−1.0 Differential input Gain 1 The gain ofthe ADC is to 16 linearly set by a 4-bit register. Offset <0.5 bit Acalibration mechanism is implemented to reduce the offset. Missing codesNONE

Clock Multiplying PLL

A clock multiplier within the PLL 18 provides a lock_detect output whichindicates the PLL's lock status. The following table showscharacteristics of the PLL: Parameter Characteristic Input clockfrequency 1 MHz < fin < 40 MHz Output clock frequency 10 MHz < fout <200 MHz Clock jitter <200 ps Lock time <1 ms

Image Sensor Interface

The image sensor interface is used internally in Ganymede to read theimage sensor data. The interface between Ganymede and Callisto(represented by signals iclk, isync, ivalid, idata) is described belowin more detail.

The following table shows the image sensor interface pins: Name FunctionType icapture This signal triggers a frame Digital input capturesequence. sleep This signal puts the image Digital input sensor tosleep. frame_reset This signal resets the pixel Digital input voltage inFF mode. frame_capture This signal captures the Digital input pixelvoltage in FF mode. read_row This signal triggers the Digital inputdownload of a row of data and subsequently a series of ADC conversionsfor the data of that row. ar[7:0] This is the row address bus. 8-bitdigital input ac[7:0] This is the column address bus. 8-bit digitalinput data_ready This signal indicates that Digital output the analogoutput is ready. (This signal may be used to start a conversion in theADC). aout This is the analog output analog outputs data from the sensorwhich is input to the ADC. iclk This is the clock signal. digital input

FIG. 3 shows a timing diagram of image sensor event signals in a“Freeze-Frame” mode of the sensor array 12, whilst FIG. 4 shows atypical timing diagram of the image sensor interface during a readcycle. It should be noted that the number of clock pulses between eventsin all timing diagrams is for the purposes of illustration only. Theactual number of clock cycles will vary depending upon the specificimplementation.

ADC Interface

The control and timing block 16 provides timing and control signals tothe ADC 26. The following table shows the ADC 26 pins. Signal FunctionType sleep This puts the ADC to sleep Digital input iclk The clockDigital input start_conv A transition from low to high Digital input onthis signal starts the conversion process. end_conv A transition fromlow to high Digital output indicates that the conversion has ended.start_calibrate A transition from low to high Digital input on thissignal starts the calibration process in the next clock cycle.end_calibrate A transition from low to high Digital output indicatesthat the calibration process has ended. pga_gain The gain of the PGAamplifiers 3-bit digital input used at the input of the ADC. ain Theanalog input to the ADC. Analog input dout[7:0] The digital output ofthe ADC. 8-bit digital output.

A typical timing diagram of the ADC interface during a conversion cycleis shown in FIG. 5. The conversion is triggered by the start_convsignal. During this period the analog inputs are also valid. Theend_conv signal indicates the end of conversion, and the output digitaldata dout is then valid. The end_conv signal is set to low when thestart_conv goes from low to high.

A typical timing diagram of the ADC interface during a calibration cycleis shown in FIG. 6. The start_cal signal triggers the calibration cycle.The period that it takes for the calibration to take place will dependon the particular architecture.

Clock Multiplying PLL Interface

The clock multiplier provides multiplication factors of the form M/N,where M and N are positive integer values. The following table shows thepins of the clock multiplier. Signal Function Type sleep This puts theADC to sleep Digital input iclk The clock Digital input start_conv Atransition from low to high Digital input on this signal starts theconversion process. end_conv A transition from low to high Digitaloutput indicates that the conversion has ended. start_calibrate Atransition from low to high Digital input on this signal starts thecalibration process in the next clock cycle. end_calibrate A transitionfrom low to high Digital output indicates that the calibration processhas ended. pga_gain The gain of the PGA amplifiers 3-bit digital inputused at the input of the ADC. ain The analog input to the ADC. Analoginput dout[7:0] The digital output of the ADC. 8-bit digital output.

The timing of the clock multiplier is shown in FIG. 7. The time that ittakes for the output clock frequency to settle is determined by thesettling/lock characteristics of the clock multiplier as specifiedabove.

Power/Sleep Interface

This interface controls the power state of the modules in Ganymede. Eachmodule in Ganymede has a digital input pin, which turns the module on oroff.

Operation

REGISTERS

This section describes the registers that are used in Ganymede. Notethat Callisto's registers are described in Appendix E.

The address gaps between registers is intentional, to allow possibleexpansion during the design process, and also to facilitate theclassification of registers and their functions.

Image Sensor frame_reset Timing Register

The reset value for the frame_reset_high corresponds to 1.6 us using a20 MHz clock. TABLE 7 Frame_reset timing register 32-bit Reset FieldWidth Bits value Description frame_reset_delay 16 15:0  0x0000 This isthe delay, in number of clock pulses, between the rising edge of theframe_reset and the capture signals. (t1 in FIG. 3) frame_reset_high 1631:16 0x0020 This is the period, in number of clock pulses, whenframe_reset is high. (t2 in FIG. 3)

Image Sensor frame_capture Timing Register

The reset values correspond to 140 us and 1.6 us, respectively, using a20 MHz clock. TABLE 8 frame_capture timing register 32-bit Reset FieldWidth Bits value Description frame_capture_delay 16 15:0  0x0B00 This isthe delay, in number of clock pulses, between the rising edge of theframe_capture and the capture signals. (t3 in FIG. 3) frame_capture_high16 31:16 0x0020 This is the period, in number of clock pulses, whenframe_capture is high. (t4 in FIG. 3)

ADC Calibration Output Register

This register contains the offset error value obtained after acalibration cycle. TABLE 9 ADC offset register 8-bit Reset Field WidthBits value Description ADC_offset 8 7:0 0x00 The offset of the ADC

Clock Multiplier Counter Register TABLE 10 Clock multiplier counterregister 8-bit Reset Field Width Bits value Description PLL_count_M 43:0 0x0 The feedback divider ratio for the clock multiplier. PLL_count_N4 7:4 0x0 The forward divider ratio value for the clock multiplier.

Configuration Register TABLE 11 Configuration register 8-bit Reset FieldWidth Bits value Description ADC PGA gain 4 3:0 0x0 The gain of the PGAused in the ADC. Calibrate 1 4 0x0 0 to 1 = Perform internalcalibration. TBD 3 7:5 0x0 TBD

Status Register

This is a read-write register. TABLE 12 Status register 8-bit ResetField Width Bits value Description Calibration Status 1 0 b′0 Flags thecompletion of the internal calibration Capture overflow 1 1 b′0Indicates that a new capture signal has arrived before the previouscapture cycle has ended. Upon read, this register is reset to 0. PLLLock status 1 2 b′0 0 = Not in lock 1 = In lock TBD 6 7:2 0x00 TBD

4.1.7 Sleep Control Register

This register contains the sleep status for the associatedmodules/circuits. A value of 1 means that the circuit is off (in sleepmode), and a value of 0 means that the circuit is on (active mode).TABLE 13 Sleep control register 8-bit Reset Field Width Bits valueDescription Sensor 1 0 0 Image sensor sleep signal ADC 1 1 0 ADC sleepsignal AUTO 1 2 0 Automatically turn-off relevant image sensor circuitsduring the non-capture mode. TBD 5 7:3 0 TBD

Test Control Register

This register controls which signal is being connected to the PROBE pad,and also controls the test mode of Callisto. Notice that the PROBE padis a direct analog pad which only has the protection circuits.

Each signal may be appropriately buffered before being connected to thePROBE pad. At any given time only one bit of this register shall behigh. TABLE 14 Test control register 16-bit Reset Field Width Bits valueDescription Column circuit 1 0 b′0 Connect the column output/ADC inputcircuit output and ADC input to PROBE VBG 1 1 b′0 Connect the bandgapgenerator output to PROBE PLL input 1 2 b′0 Connect the input clock tothe PLL to PROBE PLL feedback 1 3 b′0 Connect the feedback clock (afterthe divider) to PROBE PLL charge pump 1 4 b′0 Connect the charge pumpoutput to PROBE PLL output 1 5 b′0 Connect the PLL output clock to PROBEPLL lock detect 1 6 b′0 Connect the PLL lock detect output to PROBE Bias1 1 7 b′0 Connect the bias1 signal to PROBE Bias 2 1 8 b′0 Connect thebias2 signal to PROBE TBD 6 14:9 0x00 TBD Callisto Test enable 1 15 0x0Control the test (ten) mode of Callisto.Operation Modes

Normal Operation

In this mode the start of the capture cycle is determined by theicapture signal.

The period of a capture cycle is determined by the period of theicapture signal. However, if a new capture signal arrives before theprevious capture cycle has ended, the capture signal is ignored and the“Capture overflow” status flag is set high and remains high until it isexplicitly cleared. The normal operation, however, resumes if a newcapture signal arrives after the current capture cycle.

Reset Mode

When RESETB is set low, and iclk is toggling, Ganymede and all itscomponents are reset, and all registers are reset to predefined values.The reset cycle takes only one clock cycle of iclk. The reset cycle isrepeated as long as the RESETB pin is low.

Section C—Ganymede Design

A CMOS process offers several different photodetector structures, almostall present as parasitic devices. The main devices are photogate,vertical and lateral bipolar structures, and vertical and lateraldiodes.

The preferred structure was chosen mainly on the estimated sensitivityof that structure in the 800-850 nm range. Sensitivity is a function ofseveral parameters:

Quantum efficiency (dependent on junction profile)

Effective detector area (the effective area can be improved by usingmicrolenses)

Pixel capacitance (which depends on the structure as well as the pixelcircuits)

Among these, quantum efficiency plays a more important role in theselection of the structure, as the other two parameters are lessdependent on the junction profile.

Pixel Circuits

This section describes the circuits used at each pixel. Here we onlydiscuss the shuttered (or freeze-frame) pixel circuits, althoughunshuttered pixels can also be used Two circuits commonly used for ashutter pixel are shown in FIGS. 8 a and 8 b. The difference between thetwo circuits is the location of the reset transistor M1 with respect tothe storage node X. In both circuits M1 is the reset transistor, M2 isthe transfer transistor, M3 is the output transistor, and M4 is therow-select transistor. The capacitor Cs is the storage capacitance,which may implicitly exist as parasitic capacitances at the storage nodeX. Alternatively, additional capacitance can be added to improve thecharge retention capability of the pixel.

FIG. 9 shows a typical timing of the signals and voltages.

Notwithstanding their differences, the circuits of FIGS. 8 a and 8 b arealmost identical with respect to sensitivity and dark current. This isbecause during the active period of the pixel (integration time) shownin FIG. 9, when M2 is on, the storage node X sees the same amount ofcapacitance and junction diodes.The main difference between operation ofthe two circuits is during the reset period of the read cycle. For thecircuit of FIG. 8 a, the tx signal should also be on to allow thestorage node to be reset, while the circuit of FIG. 8 b does not requirethis. Also in the circuit of FIG. 8 a, the photodetector current willlower the reset voltage at node X, and will induce an image dependentreset noise. However, during the reset period of the circuit of FIG. 8b, M2 can be turned off.

Reset Voltage Drop

A major problem faced by all active pixel circuits is the voltage dropwhen the reset voltage is lowered. In shuttered pixels there is also thevoltage drop induced by the transfer transistor. It should be noticedthat this voltage drop reduces the dynamic range of the pixel, andtherefore is an undesirable effect. The voltage drop is caused becauseof capacitive coupling between the gate of these transistors are thestorage node.

Many alternatives have been suggested to remedy this problem, includingincreasing the reset voltage Vreset to account for the voltage drop, orusing more complex read-out circuits. All of these alternatives bringtheir own set of undesirable side-effects.

FIG. 10 shows a preferred embodiment of a pixel design which reducesthis problem. As shown, the storage node includes a capacitor, the otherside of which is connected to txb, the logically negated version of tx.It will be appreciated that txb is a particularly convenient signal, interms of timing and voltage, to use. However, any other suitable signalcan be used to partially or wholly compensate for the voltage drop.

The value of the capacitor is determined such that it compensates forthe substantially all of the voltage drop effects. Physically thecapacitor can be implemented such that it covers the active circuits,such that it does not affect the fill factor of the pixel. For a typical10 um×10 um pixel, the amount of capacitance needed to compensate forthe voltage drop is about 0.2 fF. Compared to the total capacitance of30-40 fF, this is negligible, and therefore it does not affect thesensitivity of the pixel.

Sensitivity

Before starting any discussions we define the “sensitivity” to avoidconfusion with other implied meanings of this term. The term“sensitivity” used here is the conversion factor from input light powerin Watts to output pixel voltage in Volts.

The main parameters determining sensitivity are the QE, pixel area, andeffective pixel capacitance. In order to simulate the sensitivity we usethe circuit shown in Figure. The input current sources are ratioed toreflect their respective QE at a wavelength of 850 nm. For a 1Watt/m{circumflex over ( )}2 input light at 850 nm the photon flux perunit area is:$N = {\frac{\lambda}{h\quad c} = {\frac{850 \times 10^{- 9}}{6.63 \times 10^{- 34} \times 3 \times 10^{8}} = {4.27 \times 10^{18}\frac{1}{s.m^{2}}}}}$

Using the simulated QE numbers for the Nwell-Psub and Pdiff-Nwell-Psubstructures, we can conclude that for a 10 u pixel, with approximately80% fill factor, the photocurrent for a 1-Watt/m{circumflex over ( )}2input light will be$I_{{NWell} - {Psub}} = {\frac{Q\quad E \times A \times F\quad F \times A \times q}{t} = {{0.123 \times 4.27 \times 10^{18} \times 0.8 \times 10^{- 10} \times 1.6 \times 10^{- 19}} = {0.672 \times 10^{- 11}}}}$$I_{{Pdiff} - {NWell} - {Psub}} = {\frac{Q\quad E \times A \times F\quad F \times A \times q}{t} = {{2.28 \times 4.27 \times 10^{18} \times 0.8 \times 10^{- 10} \times 1.6 \times 10^{- 19}} = {13.5 \times 10^{- 11}}}}$

In order to estimate the sensitivity we can use these values in atransient simulation. However, as most spice simulators are not tailoredfor low current simulations to give accurate “current” outputs, and theavailable simulators could not converge, we will use a different methodto estimate the effective capacitance at the storage node, and thendeduce the sensitivity. We use AC simulations. By applying an AC voltageat the storage node, and then measuring the drawn current, we can findan estimate for the total capacitance.

From the simulations the total capacitance at the storage node is 31 fFand 40 fF for the Nwell-Psub, and Pdiff-Nwell-Psub structures,respectively. The sensitivity of the devices can be calculated to be21.6 and 337.5 V.s/W for the respective structures.

Area Dependence

We have found that sensitivity improves only as a function of fillfactor, and is relatively constant for pixel dimensions larger than 10um.

Column Circuit

A column circuit 30, as shown in FIG. 11, is present at each column ofthe sensor array 12. At the end of an integration cycle, the columncircuit 30 is activated. The rows are sequentially multiplexed to theinput of this circuit. The illustrated circuit performs buffering inaddition to pixel level and column level correlated double sampling(CDS).

In the column circuit 30, the source-follower transistor and theread_row transistor are connected to three other transistors in such away to form a basic unity-gain buffer. This circuit is advantageous overthe traditional source-follower structure, as it provides a gain closerto one, and therefore reduces the dynamic range loss from the pixel. Theoutput of the first buffer is sampled twice, using two identicalsample-and-hold structures. The sampling is first done by activating thesignal_hold, and storing the value on Cr. Then all pixels in the row arereset, and the reset value is sampled, this time onto the Cs capacitor.This operation performs the pixel level CDS.

During the period when the sampling is performed, the cro signal is sethigh, and in effect resets the output buffer circuits following thenodes Xr and Xs. Once sampling has finished, the cro signal is set lowand the sampled signals are transferred to Xr and Xs, and buffered tothe outputs. This operation performs column level CDS.

It should be mentioned that the circuit following the sensor (either aPGA or ADC), should be designed such that it can benefit from the columnlevel CDS mechanism, i.e. it can process the outputs from the twodifferent phases of cro.

Column Decoder

The column decoder is part of the column circuit 30. It implements a N-2N decoder, and as such it can be used in a random access mode.

Timing

The timing of the signals controlling the pixel and column circuits canbe separated into alternating integration and read-out cycles.

During each integration cycle 32, the entire sensor array 12 is firstreset and then the electronic shutter is left open to integrate thephotocurrent. At the end of this cycle the shutter is closed and theintegrated charge is stored in the pixel. In the read-out cycle 24 thestored charge is read out row by row and the pixel-level andcolumn-level CDS is performed, and the output is read out pixel bypixel.

The timing diagram for the integration cycle 32 is shown in more detailin FIG. 12. The main signals during this cycle are the reset and txsignals. These signals act on all pixels in the sensor array.

The read-out cycle is more complex as it involves several differentoperations. FIG. 13 shows the sequence of events and the timing diagramduring the read-out cycle. The read-out cycle essentially consists of aseries of “read and CDS row(n)” cycles 36, for all rows of the sensorarray 12. Each “read and CDS row(n)” cycle 36 in turn consists of a“sample row data”, a “pixel CDS” 40, and a series of “column CDS” cycles42. During the “sample row data” period 38, first signal_hold is sethigh, and the data is sampled and held by its corresponding capacitor.In the next phase, the entire row of pixels is reset and the reset valueis sampled and held by its associated capacitor. The row decoder circuitis designed such that it supports the resetting of only one row ofpixels during the read-out cycle, while it globally resets the pixelarray during the integration cycle. The pixel CDS 40 is inherently doneduring this same cycle.

During each of the “column CDS” cycles 42, first the signal cro is sethigh to provide the off-set component of the column circuits, and thencro is set low to transfer the sampled signal and reset values to theoutput. This operation is repeated for all the columns in the sensorarray 12.

Row Decoder

Turning to FIG. 14, a row decoder 44 is responsible for providingmultiplexing signals for the rows, and also controlling the behaviour ofthe reset and tx signals. The decoding is performed by a NOR-NANDstructure 46.

The dec_enable signal controls the behaviour of the reset and txsignals. When dec_enable is low, the entire row decoder is disabled andnone of the rows are activated. At the same time, the reset and txsignals will take a global role and can be active on all rows.

As the row decoder 44 implements a N-2N decoder, it can be used in arandom access mode.

Level shifter buffers 48 are used to translate the logic levels from VDDto VCC (in this design from 1.8 V to 3.0V). FIG. 15 shows one of thelevel shift buffers 48. The level shift buffer uses a basic feedbacklevel shifter, which is properly ratioed to avoid any potential latch-upduring fast transitions. In this circuit except for the two inverters,all other transistors are designed with the high voltage option. Noticethat output PMOS transistor 50 has been intentionally made weaker thanNMOS 52, to remove any possible overlap between the outputs from twoconsecutive rows when switching from one row to the next.

Biasing

The only circuits that require biasing are the column circuits 30. Thereare four biasing voltages that need to be generated: two for the inputbuffer (biasn and biasp), and two for the output buffer (biasn_out andbiasp_out) (see FIG. 11).

FIG. 16 shows the generator circuitry, comprising basic resistor-basedbias generators.

Layout Design

The layout design of the sensor is described in this section. The mostimportant part of the layout design is the pixel design, and theinteracting layouts surrounding the pixel array.

A VSS ring, which also has the Psubstrate tap, surrounds the pixelarray. This is to ensure that the NMOS transistors within the pixelarray receive the best possible substrate biasing, as there is noPsubstrate tap inside the pixels to conserve area.

Pixel Layout

The layout of the pixel should be such that the effective photodetectorarea is maximised. In the following section we present the layout designof the four different pixel structures that have been selected asalternative candidates for use in the Jupiter design.

Photodiode with Capacitor

FIG. 17 shows a layout of a 10 um pixel using a photodiode and alsohaving the capacitor for compensating the reset voltage drop asdescribed above.

The photodiode is an NWell-Psub structure, including a central NWellconnection, from which the silicide layer is removed (except where thecontact to M1 is formed). The VCC supply voltage runs both horizontallyand vertically to produce a mesh power structure, which reduces theimpedance of the supply planes significantly.

The read, reset, tx and txb signals run horizontally. The out signalruns vertically. The capacitor has been highlighted in the figure. It isformed by the parasitic capacitance between M4 and M5. “txb” runs on M5,and has been widened where the capacitor is formed. The bottom platewhich is on M4 is connected to the storage node through a set of stackedvias. For the specific value required for the capacitor, it turns outthat the implemented capacitor covers all the active area of thetransistors, and therefore it also provides a natural shield for thesecircuits.

For the illustrated 10 um pixel, the fill factor is approximately 87%.

Photodiode without Capacitor

FIG. 18 shows a layout of a 10 um pixel using a photodiode. The pixel isalmost identical to that shown in FIG. 17, without the capacitor. Thereis no M4 below the area where txb has been widened, and therefore nocapacitance is formed.

Photo-BJT with/without Capacitor

FIG. 19 shows a layout of a 10 um pixel using a Pdiff-NWell-Psub BJT asthe photodetector. The layout is very similar to those using aphotodiode. The pixel circuit is identical to that used in thephotodiode based pixels, and therefore it will not be described hereagain.

The Pdiff area in this case has been maximized to increase the emitterarea. The silicide has been removed from the Pdiff area, except wherethe emitter contact is made.

Power Routing

A VSS ring which also has the Psubstrate taps surrounds the pixel array.This is to ensure that the NMOS transistors within the pixel arrayreceive the best possible substrate biasing, as there is no Psubstratetap inside the pixels. A VCC ring also surrounds the array, mainly toensure that VCC is supplied from all sides of the array to the pixels.

The VCC supply in the pixels runs both horizontally and vertically, toproduce a low impedance supply mesh. The power routing to the row andcolumn decoders are provided using the top metal layers from M3 to M6.

Light Shielding

The most critical circuits in any image sensor that may be affected bythe incoming light are the row and column driving circuits, simplybecause they are physically close to the pixel array and therefore willbe exposed to light. In order to avoid any potential problems, all thecircuits in the current design are covered by metal layers. Notice thatthe design rules do not allow the use of a single continuous layer ofmetal, and therefore multiple overlapping metal layers have been used tocover the circuits in the preferred embodiment.

It is also worth mentioning that in the 800 nm+ range of inputwavelength, only NMOS transistors can potentially be affected by thelight, as the PMOS transistors are inside an NWell and have an intrinsicbarrier for the photo-generated carriers, which are generated deep inthe silicon bulk. Nevertheless, all circuits have been shielded in thepreferred embodiment.

Interface

FIG. 20 shows the block diagram of the image sensor. The sensor consistsof an M×N pixel array 54, an array of N row decoder circuits 56, anarray of M column decoder circuits 58, and a bias circuit 60.

The size and the number of pixels can be designed according to therequired specification.

1.6.2 Operation

This section describes basic steps to operate the sensor. The imagesensor only supports one operation mode, which is the normal mode.

In order to operate the sensor in the normal mode the following stepsare be followed:

-   1. Set all the digital input signals to low.-   2. Apply the appropriate VDD, VCC, and VSS supply voltages-   3. Set the Enable_bias input to high, and wait for at least 1us.    This step may be bypassed if the Enable_bias has already been set    high.-   4. Set the tx input to high.-   5. Set the reset input to high. This will reset all pixels in the    array.-   6. Wait for the desired integration time.-   7. Set the tx input to low. This will close the shutter and store    the image at the storage node.-   8. Set the “row” address bus to the desired starting address.-   9. Set the “col” input address bus to the desired starting address.-   10. Set the row_dec_enable and col_dec_enable both to high.-   11. Set the signal_hold to high.-   12. Set the signal_hold to low.-   13. Set reset to high.-   14. Set reset_hold to high.-   15. Set reset_hold to low.-   16. Set the cro to high. At this time the two output signals,    signal_out and reset_out, will have the column offset value.-   17. Set cro to low. At this time the two output signals will have    the pixel signal and reset values.-   18. Change the “col” address bus to the next desired value, and    repeat the steps from Step 16 to Step 18, up to the last desired    column address.-   19. Change the “row” address bus to the next desired value, and    repeat the steps from Step 11 to Step 19, up to the last desired    column address.-   20. If the sensor is to be disabled, set all the digital inputs to    low. However, if th sensor is to remain enabled, set all digital    inputs except Enable_bias to low.

Disabling the Sensor

In order to disable the sensor at any time, the Enable_bias,col_dec_enable, and row_dec_enable signals are set to low. The reset andtx signals should also be set to low, otherwise, the sensor maydissipate power.

8-Bit ADC Design

ADC Architecture

The selection of appropriate architecture for the ADC is a critical stepin achieving reliable design, and silicon performance. A fullydifferential pipelined ADC design is used in the preferred embodiment. Aredundant signed digit (RSD) structure is used because it presents aninherent self-correcting function due to the redundant nature of theoperation, and because it is relatively tolerant to offset error incomparators, which is the major source of error in other ADC structures.

FIG. 21 shows the structure of a pipelined RSD ADC 62. It consists ofidentical stages, each of which has an analog input, an analog residueoutput and two digital outputs.

In an RSD based pipeline ADC, in the first step the input is comparedagainst two levels. These two levels are often chosen at +Vref/4 and−Vref/4. If the input is above both levels the input is reduced byVref/2 and then amplified by a factor of 2. If the input is between thetwo levels, the input is directly amplified. And finally, if the inputis below both levels, the input is increased by Vref/2 and thenamplified by a factor of 2. The input-output equations for one stage ofthe pipeline are $\begin{matrix}{{if}\left( {{Vin} > \frac{Vref}{4}} \right)} & {{{BP} = 1},{{BN} = 0}} & {{Vout} = {2\left( {{Vin} - \frac{Vref}{2}} \right)}} \\{{if}\left( {{- \frac{Vref}{4}} < {Vin} < \frac{Vref}{4}} \right)} & {{{BP} = 0},{{BN} = 0}} & {{Vout} = {2\left( {{Vin} - \frac{Vref}{2}} \right)}} \\{{if}\left( {{Vin} < {- \frac{Vref}{4}}} \right)} & {{{BP} = 0},{{BN} = 1}} & {{Vout} = {2\left( {{Vin} - \frac{Vref}{2}} \right)}}\end{matrix}$

Vin is the analog input, BP and BN are the digital outputs, and Vout isthe analog residue output.

In order to convert the digital outputs of each stage we should rememberthat an output of BP=1, BN=0 means that this digit has a value of +1,BP=0, BN=0 has a value of 0, and BP=0, BN=1 has a value of −1. Forexample the four-bit RSD number (+1)(−1)(0)(−1) is equal to(1×8)+(−1×4)+(0×2)+(−1×1)=3Notice that we can represent 3 as (0)(0)(1)(1), hence we have aredundant representation.

The RSD digital outputs from all stages are then converted to a two'scomplement number system.

Implementation

The ADC bit-slice can be implemented using switched capacitor circuits.In this approach the input to each stage is first sampled on twocapacitors Cs (sampling capacitor) and Cf (feedback capacitor). At thesame time the input is compared against two levels and the digital bitsare extracted. In the second phase, the capacitors are connected to anopamp to form an amplifier with a gain of 2.

For higher resolution ADCs (more than 8 bits) or for mixed signaldesigns, a differential approach is used, to reduce the effects ofcharge injection and substrate coupling.

FIG. 22 shows the structure of one bit slice, and FIG. 23 shows thecapacitor connections in three bit slices of the ADC in one cycle.

A critical component of the bit-slice is the operational amplifier 64.The gain, speed, and power dissipation of the opamp determines theoverall performance of the ADC. A fully-differential folded-cascodestructure was chosen for this design for the following reasons.

Folded-cascode often does not require compensation.

The gain of a folded-cascode opamp can be improved using gain-boostingtechniques.

The optimization of the opamp is simpler due to the smaller number oftransistors in the circuit.

The biasing of the opamp can be varied without affecting the stability.Therefore, if a lower speed ADC is required the bias current can simplybe reduced to lower the power dissipation.

FIG. 24 shows a simplified circuit diagram of the folded cascode opamp64. Not shown in this Figure is the common-mode feedback circuit, whichforces the common-mode voltage at the output nodes to a predefinedvalue.

This circuit is simplified for illustrative purposes and does notrepresent the overall complexity involved in the design. In thefollowing sections the design of each major component is described andthe justifications for using a particular circuit are explained.

Biasing

The biasing circuit provides biasing voltages that are used throughoutthe ADC bit-slices, and also in the PGA. The choice of the biasingvoltages is very crucial. In general a trade-off between area (size ofbias transistors), and the power dissipation (the bias currents) shouldbe made. FIG. 25 shows the biasing circuit. The role of the biasvoltages in the opamp are as follows:

biasn[1] This voltage is used to determine the bias current in the inputbranch and the NMOS transistors, MN1 and MN2.

biasn[2] This voltage is used for the folded cascode opamp, anddetermines the effective DC bias voltage across MN1 and MN2.

biasp[1] This voltage is used to determine the bias current in PMOStransistors MP1 and MP2.

biasp[2] This voltage is used for the folded cascode opamp, anddetermines the effective DC bias voltage across the PMOS transistors MP1and MP2

In the actual implementation the sizes of the transistors have beenoptimized such that the VDS voltages are always at least 0.1 volts abovethe VDS, sat of the bias transistors in the folded structure. This is toensure that these transistors are always in the saturation region.

The input current to the bias generator is provided by the referencecurrent generator described below.

Common Mode Circuit

The common mode feedback circuit (CMFB) forces the outputs of the foldedopamp to have a predefined common-mode voltage. This circuit effectivelytries to change the biasing conditions through a feedback loop. FIG. 26shows the implemented CMFB circuit.

The differential output of the opamp is used in a capacitive divider tofind the common mode voltage of the output. This voltage is then fedback into two differential pairs, which control a current that isinjected into the NMOS branch. The other input of the differential pairsis connected to the common mode voltage VCM. This feedback mechanismeffectively sets the common mode voltage at the output to VCM. The sizeof the capacitors Ccmfb in this circuit is only about 50 fF.

The dynamics of the CMFB directly affects the dynamics of the opamp, andtherefore during circuit optimization special attention should be paidto the CMFB circuit. Also notice that the CMFB circuit has a differentfeedback loop, and therefore its dynamics are almost isolated from thedynamics of the opamp.

Gain Boosting Amplifiers

In order to increase the gain of the folded cascode opamp, gain boostingstages are required. The overall gain of the folded cascode stagewithout gain boosting is less than 100. This is because the cascodetransistors have minimum length (0.18 um) to achieve a high bandwidthfor the opamp. To increase the gain of the opamp beyond the minimumrequirement (which is at least 2^(N)=2⁸=256) the gain boosting stagesshould have a gain of at least 10. This amount of gain can easily beobtained from basic OTAs, as shown in FIG. 27.

These amplifiers have been implemented such that they can be turned off.In addition to the power savings achieved by doing this, the outputvoltage when the circuit is disabled will be set to a value that turnsoff the transistor that it is connected to. For example, during the offperiod the output of the top opamp in the figure will be pulled high toVdd, and therefore the PMOS transistor driven by the output will beturned off.

This turning off mechanism reduces the pressure on the voltage sourceused to set the common mode voltage at the output of the opamp. In factwhen the gain boosting amplifiers are turned off, the output of theopamp will be floating, and the output can be set to any desired value.

An important point in the design of these stages is that their bandwidthshould be much more than the overall bandwidth of the main opamp, asotherwise they will form additional poles in the circuit and reduce thephase margin. The bandwidth of the opamp has been designed to exceed 300MHz. For an N-bit pipeline ADC the required bandwidth is approximately

Therefore, a bandwidth of about 1 GHz is required for these amplifiers.This in turn translated into a large biasing current. A relatively largeproportion of the power in the ADC is consumed by these amplifiers.

Clock Generator

The clock generator 66 produces all the clock phases necessary for theoperation of the ADC 26. The circuit is essentially a two-phase clockgenerator, and extra clock phases are also generated.

FIG. 28 shows the clock generator 66, each branch of which generates aseries of delayed clock phases. Each of these clock phases is used tocontrol the sequence of events in the pipelined ADC. Notice that theclock phases alternate between the stages of the ADC.

Reference Current Generator

As shown in FIG. 29, the reference current generator 68 uses a resistorR with a known value, and a reference voltage. This circuit requires awell controlled resistor. In order to maintain good control over thebias current against resistor tolerance the resistor in the preferredembodiment has been implemented as a digitally switched resistor ladder,as shown in FIG. 30. Each ladder consists of 16 equal resistors. Thevalue of these resistors is chosen such that the total resistance in themiddle of the ladder is equal to the required resistance.

Differential Comparators

For each stage of the ADC two comparators are required. FIG. 31 showsone of these differential comparators 68. Each comparator 68 comparesthe differential input against a differential reference voltage (Vrefp/4and Vrefn/4). A switched capacitor structure 70 has been used in thisdesign, which removes the need for generating the Vrefp/4 and Vrefn/4signals.

The switched capacitor structure 70 is followed by two cross coupleddifferential pairs 72, which act as the main comparator stages.

The reason for using two stages is that the input capacitors arerelatively small to reduce the loading on the opamps in the bit slice.This in turn dictates the use of smaller input transistors for the firststage, and therefore, the available gain from only one stage would below. The second stage ensures that the overall gain is high enough toavoid meta-stable states.

The output of output from differential pairs is passed to a latched RSflip-flop 74, which ensures that the output does not change before andafter the decision has been made, and also to make sure that the twooutputs are always inverted, which may not be the case if a meta-stablestate occurs.

Common Mode Generator

In order to generate the common mode and reference voltages necessaryfor the operation of the ADC a common-mode generator is designed.

The common mode voltage is derived from an inverter with self feedback.The advantages of this circuit are its simplicity, and automatictracking of the supply voltage and process corners. The switch is usedto cut off the feedback during the sleep mode, to avoid powerdissipation (see FIG. 32).

Reference Voltage Generator

An opamp-based circuit using resistors in the feedback loop is used toderive the Vrefp and Vrefn, as shown in FIG. 32. The reference voltagesVrefp and Vrefn can be obtained as: ${Vrefp} = {{Vcm} + \frac{Vref}{2}}$${Vrefn} = {{Vcm} - \frac{Vref}{2}}$

For a reference voltage of 1.0 volt, we will have Vrefp=Vcm+0.50, andVrefn=Vcm−0.50.

The Vref reference voltage is generated by a bandgap generator set tooutput 1.0 volt (see below for more detail).

The opamps used in this circuit are based on a wide-range OTA design, toachieve medium gain and high stability in the presence of largecapacitive loading. Note that the Vrefp and Vrefn are used to as inputto the opamp in the second phase of conversion. They are also heavilydecoupled using large MOS capacitors to reduce the bouncing on thesevoltages. The circuit is shown in FIG. 33. Miller compensation has beenused to ensure stability. The current design is stable with capacitiveloads of more than 30 pF.

Bandgap Voltage Generator

The bandgap generator produces the main reference voltage from which theVrefp and Vrefn voltages are derived. It is also used for generating thereference current used in the bias circuit.

FIG. 34 shows the structure of the bandgap generator. The resistorvalues have been chosen to produce an output voltage of approximately1.0 volt. This means that the bandgap generator is in fact out ofbalance and the output voltage will be temperature dependent. This is infact a desirable feature for this design. At higher temperatures thedynamic range (or voltage swing) of all circuits in the chip willreduce.

Therefore, if the reference voltage is constant, the required dynamicrange of circuits will be higher than what they can achieve. Forexample, the dynamic range at the output of the image sensor will belowered at higher temperatures. With a constant reference voltage, thereference levels for the ADC will be constant, and therefore, the ADCwill be forced to provide more dynamic range than required.

However, if the reference voltage has a negative temperaturecoefficient, then the biased circuits will be automatically adjusted tolower biasing currents and voltages, and the amount of dynamic rangediscrepancy will be reduced.

The opamp used in the bandgap generator is a three stage wide-range OTA,as shown in FIG. 34. This choice is to increase the gain of the opampand increase the supply rejection. Compensation is necessary in thisopamp. A nested miller compensation has been used, to reduce the size ofthe compensation capacitors.

Programmable Gain Amplifier

At the input of the ADC a digitally programmable amplifier has beenimplemented. This PGA can have gain values from 0.5 to 8 in steps of0.5. The structure uses a switched capacitor design. The simplifiedschematic diagram is shown in FIG. 36. In the first phase the input issampled onto capacitors Cs. Also other capacitors are precharged toknown values. In the second phase the capacitors are connected to theopamp and form an amplifying stage. In the first phase of the clock theswitches connected to Φ1 are closed, and in the second phase thoseconnected to Φ2.

Using charge conservation equations we can find${{Voutp} - {Voutn}} = {\left( {{Voffsetp} - {Voffsetn}} \right) + {\frac{Cs}{Cf}\left( {{{Vinp}(1)} - {{Vinn}(1)}} \right)} - {\frac{Cs}{Cf}\left( {{{Vinp}(2)} - {{Vinn}(2)}} \right)}}$where Vinp(1) and Vinn(1) are the input values during Φ1, and Vinp(2)and Vinn(2) are the input values during Φ2.

This particular structure has been chosen to facilitate correlateddouble sampling (CDS) in the image sensor. During CDS, in the firstphase of the clock the signal value is present, and in the second phasethe reset value. The values are subsequently subtracted.

The capacitor Cf in this design is 100 fF. Capacitor Cs is a linearlyselectable capacitor as shown in FIG. 37. In this figure Cs1 representsa unit capacitance of 50 fF.

PGA Opamp

The opamp used in the PGA is very similar to that used in ADC bitslices. There are however, two main changes in this opamp. One is theuse of larger transistors, mainly to increase the bandwidth of theopamp, and the other is the use of a basic miller compensation structureat the output branch, as shown in FIG. 38. The source of instability inthe PGA is from several factors. The first is the larger gain-bandwidthproduct required in the opamp. This brings the poles at the outputbranch close to other poles in the circuit, such as those at the outputof the gain boosting OTAs. Also the size of the feedback capacitors isrelatively small, to limit the total input capacitance when the gain isto its maximum. The compensation structure tries to bring the poles atthe output of the gain boosting OTAs down, and also adds a zero (byadding the series Rcomp resistor), to cancel one of the poles.

Synchronizer

The outputs from the bit slices are generated in a pipeline. During eachphase of the clock one bit slice generates an output. In order tosynchronize the outputs, synchronizing latches are used. These latchesare in fact half of a D-flip flop, and are driven by Phi1[0] and Phi2[0]clock phases (see FIG. 38). The final latches are clocked by Phi2[0].This means that the output will be valid after the negative edge ofPhi2[0], and it can be sampled safely on the negative edge of the inputclock.

Before the last latch there is a code correction logic, which isdescribed in the next section.

Output Code Correction

The RSD output of the pipeline ADC is often needed to be converted tomore conventional binary representations, such as two's complement orsigned representations.

As RSD is a redundant representation, and in a pipeline ADC differentrepresentations of the same value may occur because of errors in thecomparator, the process of converting the RSD to a binary number isreferred to as code correction.

The RSD to binary conversion is relatively simple. If we represent a7-digit RSD number asC₆C₅C₄C₃C₂C₁C₀=(B_(p6)B_(n6))(B_(p5)B_(n5))(B_(p4)B_(n4))(B_(p3)B_(n3))(B_(p2)B_(n2))(B_(p1)B_(n1))(B_(p0)B_(n0))where each digit is represented by two binary values (B_(p),B_(n)), inwhich −1=(01), 0=(00), and +1=(10). Then a two's complement number canbe obtained by subtracting a binary number formed by B_(n), from B_(p)N _(p6) N _(p5) N _(p4) N _(p3) N _(p2) N _(p1) N _(p0) =B _(p6) B _(p5)B _(p4) B _(p3) B _(p2) B _(p1) B _(p0) −B _(n6) B _(n5) B _(n4) B _(n3)B _(n2) B _(n1) B _(n0)The resulting number will range from −127 (10000001) to +127 (01111111).

Therefore, the RSD to binary conversion requires only a subtractor. Thissubtractor has been implemented as part of the synchronizer, and isinserted before the last latch in the synchronizer.

Calibrator

The calibration of the ADC can be performed using different algorithms.The preferred design has support for either a digital offsetcalibration, an analog offset calibration, or a multi-stage digital gainand offset calibration.

Before describing the different calibration methods, we should mentionthat for an 8-bit ADC the gain errors, which mainly result from thecapacitors, can be less than 1/256. This can be achieved by using abasic common centroid structure for the capacitors. Therefore, gainerror will not be a contributing factor in the overall ADC errors.

Also if an application requires only one ADC and an offset of 1% can betolerated, then offset calibration will not be necessary.

Digital Offset Calibration

This algorithm simply measures the offset of the whole ADC. This is doneby shorting the differential inputs of the ADC together and measuringthe digital value. In order to reduce the quantization effects themeasurement is done on multiple samples (for example, 128 samples).

The offset value is then digitally subtracted from the output of the ADCduring normal conversion cycles.

Notice that this method of calibration is sufficient for an 8-bit ADC;as mentioned before the gain error can be controlled well below therequired 1/256.

Analog Offset Calibration

This algorithm relies on using a calibration DAC. This time the PGA isalso involved in the calibration process (this is a feature of thecurrent design), and therefore this algorithm will present a bettersolution, specially if the PGA is set to high gain values.

In this algorithm, the differential inputs of the PGA are shortedtogether and the output of the ADC is recorded. A DAC is connected tothe offset bias inputs of the PGA. The value of the DAC is changed in afeedback loop such that the output of the ADC becomes zero.

The input applied to the DAC is then recorded as the offset correctionvalue.

Multistage Digital Gain and Offset Calibration

This more elaborate algorithm will remove the gain and offset errorsfrom all stages, through a successive algorithm. This algorithm is oftensuitable for ADC resolutions of more than 8 and less than 12 bits.

The algorithm works as follows:

-   1. The input to the last stage (LSB) of the ADC is set to zero, and    the digital values are measured. This is repeated for several cycles    (typically 128). The measured value represents the offset for this    stage.-   2. The input to the last stage is set to the mid reference range    ((Vrefp−Vrefn)/2). The output is then measured for several cycles.    The offset measurement values from Step 1 are included during this    phase. The gain error can be found from the measurements.-   3. Step 1 and Step 2 are recursively repeated for the next bit    slices until the MSB. The offset and gain errors from the previous    LSB bit-slices will be used in the calculation of offset and gain    errors of each stage.

During a normal operation, the gain and offset values obtained duringthe calibration process will be used to correct the digital outputs ofthe ADC.

Layout Design

The layout design of the ADC will directly affect the performance of theADC. Considering the ADC is a mixed-signal design by nature, it isimportant to take into account the interaction between the digital andanalog circuits and try to minimize any possible crosstalk affecting theanalog circuits. While during the circuit design we addressed this issueby using a fully differential architecture, here we describe techniquesused to complement the circuit design.

Floorplan

The placement of the blocks in the ADC is such that the most criticalcircuits, which are the PGA and the first stage(s) of the ADC arefurther away from the main source of digital noise, i.e. the clockgenerator. The last stages of the ADC are least sensitive to digitalnoise. The biasing and reference generator are the farthest block to theclock generator. In fact most of the short range substrate couplingnoise will be absorbed by the ADC stages before reaching the biasingcircuits.

Signal Routing

The signal routing is also designed to minimize the interaction betweenthe bias and clock signals. The bias signals are routed on one side ofthe ADC blocks, and the clock signals on the other. Also inside eachblock the bias and clock signals run through separate channels, furtherminimizing the interaction between signals.

In areas where the bias and clock signals cross over each other,appropriate shielding has been used to remove any potential crosstalk.

Power Routing

The VDD and VSS supply voltages surround the ADC. They run on twoseparate metal layers, which form a parallel plate capacitor to enhancesupply decoupling. Inside each bitslice the power lines from the twosides are joined together to form a mesh. In most blocks there are MOScapacitors used to locally decouple the supply voltage.

Bandgap Generator

The compensation capacitor of the bandgap generator is formed using MiMstructure. The resistors are formed using poly without silicide. Theinput of the opamp has a common centroid structure to reduce mismatch,although mismatch is not a critical parameter for this bandgapgenerator.

Biasing and Reference Circuits

This layout is located at the bottom end of the ADC floorplan, and assuch it contains the two wide metal lines for the supply voltages. Thewidth of these lines is 18 um.

ADC Bit Slice

The main capacitors in each bitslice of the ADC are formed in a commoncentroid. All bias and reference voltages are decoupled using large MOScapacitors. Supply decoupling capacitors are also used close to thelogic circuits.

PGA

The gain setting capacitors of the PGA are formed in a semi-centroidstructure to improve matching. Bias lines, including Vrefp and Vrefn aredecoupled using large MOS transistors.

Section D—ADC Design

Interface

The block diagram of the ADC 14 is shown in FIG. 40. The ADC 14 consistsof a PGA 28, seven stages of pipeline RSD ADC 70, a clock generator 72,a bias generator 74 and a synchronization and code correction block 76.

The following table sets out the function of the pins of the ADC 14.Name Type Function Enable Digital Input Active-high enable input. Whenthis input is high, all blocks will be enabled. When this input is lowall blocks will go into the sleep mode. The clock input is also gated toavoid any power dissipation. clock Digital Input The input clock. inpAnalog input The positive input to the PGA. inn Analog input Thenegative input to the PGA. inp2 Analog input The positive offset inputto the PGA. inn2 Analog input The negative offset input to the PGA.gain[3:0] Digital Input Four bits controlling the gain of the PGA, from0.5 to 8, in steps of 0.5. A value of “0000” sets the gain to 0.5, and avalue of “1111” sets the gain to 8. adc_bias[3:0] Digital Input Fourbits setting the bias resistor for the ADC. A value of “0000” sets thebias resistor to 876 Ohm, and a value of “1111” sets the bias resistorto 14 KOhm. The default value should be “1000”. disable[6:1] DigitalInput These signals disconnect one bit slice of the ADC from theprevious stage and prepare it for digital calibration. The LSB bit slicedoes not have such a feature. test[6:1] Digital Input Set the value usedduring calibration for a bit slice which has been disconnected fromprevious stage. bo[7:0] Digital 8-bit ADC output. Output VDD Supply VDDvoltage nominally set at 1.8 V VSS Ground Ground voltage set at 0 V.

Normal Operation

In normal operation the following conditions should be met:

-   Enable input should be set high.-   “test” and “disable” signals should be all set to low-   “gain” is set to the desired value-   Clock is running up to a maximum frequency of 20 MHz.

Timing in Normal Operation

The timing diagram of the signals during the normal operation is shownin FIG. 41. The input will be presented in two phases of the clock. Inthe first phase, when clock is high, the input is sampled. Typicallyduring this phase the inputs carry the offsets from the previouscircuit, and therefore they are almost the same. In the second phase ofthe operation, when clock is low, the input is sampled again. This timethe inputs carry the actual signal values. Notice that the inputs do notnecessarily need to be differential.

The output will be generated four clock cycles later. The latencybetween the time that Reset(x) has been introduced to the time that theoutput can be safely read is five and a half clock cycles. Notice thatas this ADC is pipelined, it does not have any end-of-conversionindicator.

Sleep Mode

In sleep mode, the enable input is set to low. In this mode all blockswill be disabled.

Calibration Modes

Notice that the calibration modes are not controlled by the ADC, and assuch any design that uses this ADC shall implement the relevant controllogic to perform any of the desired calibration techniques.

Digital Offset Calibration

In order to perform digital offset calibration the following stepsshould be taken

-   1. Enable input is set to high-   2. test[6:1] is set to “000000”-   3. disable[6:1] is set to “100000”-   4. Clock is running up to a maximum frequency of 20 MHz-   5. The inp and inn inputs of the PGA should be constant-   6. During the first 8 clock cycles no operation is performed-   7. For the next 64 clock cycles the digital outputs are added    together-   8. The final output is then averaged, by a right shift operation by    6 bits.-   9. The resulting value can be stored and subtracted from subsequent    ADC output during normal operation.

Analog Offset Calibration

In order to perform analog offset calibration the following steps shouldbe taken:

-   1. Enable input is set to high-   2. test[6:1] is set to “000000”-   3. disable[6:1] is set to “000000”-   4. Clock is running up to a maximum frequency of 20 MHz-   5. The inp and inn inputs of the PGA should be constant.-   6. During the first 8 clock cycles no operation is performed-   7. For the next 64 clock cycles the digital outputs are added    together-   8. If the result is not zero then the an appropriate input is    applied to the “inp2” and “inn2” offset inputs of the PGA. For this    purpose a DAC is required, which should be provided by the    calibration control mechanism.-   9. The steps are repeated until the digital output is zero.-   10. The resulting value can be stored and applied to the “inp2” and    “inn2” input of the PGA during the normal operation.

Digital Multistage Gain and Offset Calibration

In order to perform digital offset calibration the following stepsshould be taken:

-   1. Enable input is set to high-   2. The PGA gain is set to “0000”, and the differential inputs to the    PGA shall remain constant during the calibration process.-   3. Clock is running up to a maximum frequency of 20 MHz-   4. test[6:1] is set to “000000”-   5. disable[6:1] is set to “111111”-   6. During the first 8 clock cycles no operation is performed-   7. For the next 64 clock cycles the digital outputs are accumulated    and stored. This value represents the offset value.-   8. test[6:1] is set to “000001”.-   9. During the first 8 clock cycles no operation is performed.-   10. For the next 64 clock cycles the digital outputs are accumulated    and stored. Subsequently the offset value measured in Step 7 is    subtracted from this. The gain error is then calculated from the    resulting value.-   11. Step 4 to Step 10 are repeated for the next bit slices, while    the values of test and disable are shifted by one bit.

The gain and offset values will be used during the normal operation todigitally correct the output code from the ADC.

Section E—Callisto Image Processor

Callisto is an image processor designed to interface directly to amonochrome image sensor via a parallel data interface, optionallyperform some image processing and pass captured images to an externaldevice via a serial data interface.

FEATURES

-   -   Parallel interface to image sensor;    -   Frame store buffer to decouple parallel image sensor interface        and external serial interface;    -   Double buffering of frame store data to eliminate buffer loading        overhead;    -   Low pass filtering and sub-sampling of captured image;    -   Local dynamic range expansion of sub-sampled image;    -   Thresholding of the sub-sampled, range-expanded image;    -   Read-out of pixels within a defined region of the captured        image, for both processed and unprocessed images;    -   Calculation of sub-pixel values;    -   Configurable image sensor timing interface;    -   Configurable image sensor size;    -   Configurable image sensor window;    -   Power management: auto sleep and wakeup modes;    -   External serial interface for image output and device        management;    -   External register interface for register management on external        devices.

Environment

Callisto interfaces to both an image sensor, via a parallel interface,and to an external device, such as a microprocessor, via a serial datainterface. Captured image data is passed to Callisto across the paralleldata interface from the image sensor. Processed image data is passed tothe external device via the serial interface. Callisto's registers arealso set via the external serial interface.

Function

BLACK-BOX DESCRIPTION

The Callisto image processing core accepts image data from an imagesensor and passes that data, either processed or unprocessed, to anexternal device using a serial data interface. The rate at which data ispassed to that external device is decoupled from whatever data read-outrates are imposed by the image sensor.

The image sensor data rate and the image data rate over the serialinterface are decoupled by using an internal RAM-based frame store.Image data from the sensor is written into the frame store at a rate tosatisfy image sensor read-out requirements. Once in the frame store,data can be read out and transmitted over the serial interface atwhatever rate is required by the device at the other end of thatinterface.

Callisto can optionally perform some image processing on the imagestored in its frame store, as dictated by user configuration. The usermay choose to bypass image processing and obtain access to theunprocessed image. Sub-sampled images are stored in a buffer but fullyprocessed images are not persistently stored in Callisto; fullyprocessed images are immediately transmitted across the serialinterface. Callisto provides several image process related functions:

-   -   Sub-sampling;    -   Local dynamic range expansion;    -   Thresholding;    -   Calculation of sub-pixel values;    -   Read-out of a defined rectangle from the processed and        unprocessed image.

Sub-sampling, local dynamic range expansion and thresholding aretypically used in conjunction, with dynamic range expansion performed onsub-sampled images, and thresholding performed on sub-sampled,range-expanded images. Dynamic range expansion and thresholding areperformed together, as a single operation, and can only be performed onsub-sampled images. Sub-sampling, however, may be performed withoutdynamic range expansion and thresholding. Retrieval of sub-pixel valuesand image region read-out are standalone functions.

The details of these functions are provided below.

FUNCTIONS

Image Coordinate System

This document refers to pixel locations within an image using an x-ycoordinate system where the x coordinate increases from left to rightacross the image, and the y coordinate increases down the image from topto bottom. It is also common to refer to pixel locations using row andcolumn numbers. Using the x-y coordinate system used in this document, apixel's row location refers to its y coordinate, and a pixel's columnlocation refers to its x coordinate. The origin (0,0) of the x-ycoordinate system used is located at the top left comer of the image.See FIG. 43. Pixel coordinates define the centre of a pixel.

The term “raster order” is also used in this document and refers to anordering of pixels beginning at the top left corner of the image, movingleft to right, and top to bottom. Callisto assumes that pixels from theimage sensor are received in this order: pixel at location (0,0) isreceived first, then the next pixel to the right, continuing across theline. All lines are processed in this order from top to bottom. Thisassumption means that there is no coordinate translation between inputand output. According to the example shown in FIG. 43, raster orderwould be p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14,p15, p16, p17, p18, p19, p20, p21, etc. . . .

All image coordinates are relative to the image sensor window and notthe image sensor itself.

Image Sub-Sampling

The captured image is sub-sampled by passing a 3×3 window over theentire image. The “motion” of the window over the image is simplyleft-to-right, top-to-bottom.

Each 3×3 window produces a single pixel in the output image, thusproducing an image that has nine times fewer pixels than the originalimage (see FIG. 44). The nine pixels in the window are averaged toobtain the output pixel:outputPixel=1/9*(p0+p1+p2+p3+p4+p5+p6+p7+p8);

The algorithm for producing the sub-sampled image is: foreach 3×3 windowloop outputPixel = 0; foreach pixel in the window loop outputPixel +=pixel; end loop; write (1/9) * outputPixel; end loop;

In the case where there is insufficient pixel data to form a complete3×3 window, along the right and bottom edges of the original image ifits width and height are not multiples of 3, then pixels along the edgesof the image will be replicated to fill the 3×3 window. FIG. 45 showshow pixels are replicated during sub-sampling when the sub-samplingwindow goes beyond the edges of the image.

Local Dynamic Range Expansion

The local dynamic range expansion function is intended to be used toremove the effects of variation in illumination. In particular, itallows thresholding to be performed using a fixed threshold.

The general algorithm for dynamic range expansion is: for each pixel, ahistogram of the pixels in a window of specified radius about thecurrent pixel is constructed. Then the value which a specified fractionof the pixels in the window are less than is determined. This becomesthe black level. The value which a specified fraction of the pixels aregreater than is also determined, and this becomes the white level.Finally, the current pixel is mapped to a new value as follows: if itsoriginal value is less than the black level it is mapped to 0. If itsvalue is greater than the white level it is mapped to 255. Valuesbetween black and white a mapped linearly into the range 0-255.

In Callisto, the radius of the window is fixed at 2, which approximatesto a 5×5 rectangle. The fractions used are 2% for both the black andwhite levels. Since 2% of 25 (5*5 pixels) is 0.5, it suffices todetermine the minimum and maximum pixel values in a window whendetermining black and white levels. Callisto's algorithm works bypassing a 5×5 window over the image, with the pixel being processedsituated in the centre of the image (see FIG. 46). When the pixel beingprocessed is no closer that 2 pixels from the top or bottom, and 2pixels from the left or right of the image, there are sufficientneighbouring pixels to construct a full 5×5 window. When this conditiondoes not hold there are not enough pixels to construct a 5×5 window, andin this case dynamic range expansion is performed on the availablepixels; in FIG. 47 there are only 16 of 25 pixels available in thewindow for the pixel being processed, so only these 16 are considered incalculating the dynamic-range-expanded value for the pixel beingconsidered.

For each pixel being processed, a window around that pixel isconstructed as described above. For all the pixels in that window,including the pixel being processed, both the minimum and maximum pixelvalues are recorded. The new pixel value is calculated by mappinglinearly into the range 0 to 255 according to the max and min values inthe current window. That is:newPixelValue=255*(pixelValue−min)/(max−min)

Unless the max and min values are the same, in which case the new pixelvalue is set to 255. The algorithm described in pseudo code: foreachpixel in image loop construct 5×5 window; min = 255; max = 0; foreachpixel in 5×5 window loop if pixel > max then max = pixel; end if; ifpixel < min then min = pixel; end if; end loop; if max = min then pixel= 255; else pixel = 255*(pixel−min)/(max−min); end if; end loop;

Thresholding

Thresholding is a simple function that converts an 8-bit pixel valueinto a 1-bit pixel value based on the comparison of the 8-bit pixelvalue with a pre-defined threshold value, stored in a Callisto register.This is the pseudo-code that describes the algorithm: foreach pixel inimage loop if pixel >= threshold then pixel = 1; else pixel = 0; end if;end loop;

Combining Thresholding and Dynamic Range Expansion

Let's assume that t is the threshold value, and that v is the pixelvalue being dynamic-range-expanded, and that a is thedynamic-range-expanded pixel value. Thresholding requires the followingcomparison:a>=t

Substituting the dynamic range expansion equation yields:255*(v−min)/(max−min)>=t

And by re-arranging:255*(v−min)>=t*(max−min)v−min>=(t/255)*(max−min)v>=((t/255)*(max−min))+min

By combining dynamic range expansion and thresholding a complicateddivide (a divide by max-min) is replaced with a simple constant divide.The divide may be eliminated altogether by requiring the user to specifyt/255 rather than just t. This equation holds true when min=max.

Sub-Pixel Read

Sub-pixel read allows the user to ascertain the grey level value at anarbitrary location which lies between pixels in the captured image, i.esub-pixels.

FIG. 48 shows the location of the desired sub-pixel with respect toactual image pixels. Sub-pixel coordinates are expressed as 8.4 fixedpoint values. The values dx and dy in FIG. 48 simply refer to thefractional portion of the sub-pixel coordinates. The grey scale value vfor the pixel shown, which lies between pixels v00, v10, v01, v11 iscalculated as follows:v 0=v 00+dx*(v 10−v 00);v 1=v 01+dx*(v 11−v 01);v=v 0+dy*(v 1−v 0);

To reduce the interrupt processing overhead on the processor, Callistosupports calculating many sub-pixel values in a single command. WhenCallisto begins a sub-pixel read operation it is told how many sub-pixelvalues to calculate, placing all the interpolated pixel values into asingle message on the serial interface back to the processor.

Unprocessed Image Region Read Function

The image region read function of Callisto allows the user to read allthe pixel values out of a defined rectangular region of the unprocessedimage in a single operation. The region size and location may bearbitrarily set. Image data is returned in raster order.

The unprocessed image read function operates on the data in the imageframe store, i.e the unprocessed image. Because the image region to beread may be at an arbitrary location, and of arbitrary size, it ispossible to define a region that exactly fits the image. That is, usingthis function it is possible to read back the entire image in the framestore, unprocessed, thus providing a bypass path of the image processingfunctions. It would also be possible to read the entire image in variousways using this function:

-   -   A set of tiles;    -   A set of bands;    -   Line by line;    -   etc.

Processed Image Region Read Functions

Like the unprocessed image read, the processed image, or a part of it,may be read by the user. Image data is returned in raster order.

The user may specify what part of the processed image they want to readby defining a rectangular region. The coordinates used to specify thisregion lie in the processed image so that the region defined is alignedto a 3×3 boundary in the unprocessed image. The user has two choices asto the type of image processing to be performed. Either:

-   -   Sub-sample only; or    -   Sub-sample+expand dynamic range+threshold.

Out of Image Bounds

For image region read functions Callisto allows the user to arbitrarilyspecify the position and size of the region independently of the size ofthe image. This creates the possibility that the some or all of thespecified region may lie outside of the image. Callisto does not performany bounds checking in this regard. If the user does specify a regionwhere all or parts of it lie outside the region, pixel values returnedfor those parts of the regions outside the image will have undefinedvalues.

There are no side effects or consequences of specifying regions that arenot wholly within an image other than that the pixel values returnedcannot be predicted.

Direct Writing to Frame Store Buffer

Callisto writes valid pixel data on the image sensor interface to theframe store buffer; this data normally comes from an image sensor.Callisto provides a mode of operation which allows the user to directlywrite pixel data into the frame store buffer by sending Callisto a“write to frame store” message. By putting Callisto into the appropriatemode—setting the FrameWrite bit in the configuration register—the useris able to write data, four pixels at a time, directly into the framestore buffer by sending Callisto a FrameStoreWrite message. For thefirst write of a frame the user must set the S bit in the message to‘1’. Once a message is sent the user must wait for aFrameStoreWriteAcknowledge message before sending the nextFrameStoreWrite message.

Callisto uses the ImageSensorWindow setting to determine when a completeframe has been written into the frame store buffer.

Serial Interface

The serial interface to Callisto is used for several purposes:

-   -   Processor issuing Callisto commands.    -   Processor issuing register access commands (read and write).    -   Callisto returning register data as a result of a register read        command.    -   Callisto returning image data.    -   Error signalling and recovery.    -   High level image sensor frame synchronisation.    -   Frame store write.

Message Types and Formats

There are six Callisto message types, as set out in the following table:Message Message Type Message Type Code Source Comment Register b′000Processor Used to access access Callisto's registers. Can either specifya read or a write. Callisto b′001 Processor Used to tell commandCallisto to perform an image processing function. Can be either:Unprocessed image region read Processed image region read Sub-sampledimage region read Sub-pixel read Register b′010 Callisto Messagecontaining data the data requested by a register read request from theProcessor. Command b′011 Callisto Message containing data data producedas a result of executing a command. Frame b′100 Processor & Messagesused for synchroni- Callisto high level software sation frame processingsynchronisation. Frame store b′101 Processor Allows the user to writewrite data directly into the frame store buffer via the serialinterface. Frame store b′110 Callisto Acknowledges the write frame storewrite acknowledge message indicating to the user that another framestore write message may be issued.

All messages consist of a constant message marker byte, common to allmessages (used for message synchronisation), followed by a control byte,specific to each message type, followed by a varying number of data byesdepending on the message type. The message marker byte is set at 0x7E.

Note that all unused bits in the control byte should always be set to‘0’.

FIG. 49 shows the general format for Callisto messages.

The following table shows a summary of the control byte arrangements foreach of the message type: Message Control Byte Type Bit 7 Bit 6 Bit 5Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 Register b′0 E W N1 N0 T2 T1 T0 accessCallisto b′0 b′0 P C1 C0 T2 T1 T0 command Register data b′0 E I N1 N0 T2T1 T0 Command data b′0 b′0 I C1 C0 T2 T1 T0 Frame b′0 b′0 b′0 S1 S0 T2T1 T0 synchroni- sation Frame store b′0 b′0 b′0 b′0 S T2 T1 T0 writeFrame store b′0 b′0 I b′0 ER T2 T1 T0 write acknowledge

The following table shows control byte field descriptions: FieldDescription T[2:0] Message Type b′000 - Register Access b′001 - CallistoCommand b′010 - Register Data b′011 - Command Data b′100 - FrameSynchronisation b′101 - Frame Store Write b′110 - Frame Store WriteAcknowledge C[1:0] Command Type b′00 - Unprocessed Image Read b′01 -Sub-pixel Read b′10 - Sub-sampled Image Read b′11 - Processed Image ReadN[1:0] Number of Bytes Defines the number of data bytes (minus one)contained in the message: b′00 - 1 byte b′01 - 2 bytes b′10 - 3 bytesb′11 - 4 bytes b′00 - For a register read E External Used to indicatethat a register access command is for an external device connected toCallisto's external register bus. W Write When set to ‘1’ in a registeraccess message, indicates a register write. P Parameters When set to ‘1’indicates that a Callisto Command message contains command parametersalso. I Interrupt When set to ‘1’ in a message from Callisto, indicatesthat the state of one of the COR bits in the status register haschanged. S[1:0] Synchronisation Message Type b′00 - Ready For New Frame(from processor) b′01 - Finished Frame Processing (from processor)b′10 - Received New Frame (from Callisto) S Start Of Frame In a FrameStore Write message indicates first write of a frame. ER Frame StoreWrite Error In a Frame Store Write Acknowledge message indicates thatthe previous Frame Store Write could not be performed because theFrameWrite bit in the configuration register was not set.

Callisto Interrupts

All messages from Callisto contain an interrupt (I) bit in the controlbyte to indicate that the state of one of the COR (clear on read) bitsin the status register has been set and that the user should examine thestatus register. Once this condition has occurred and Callisto has setan I bit in a message, it will continue to set the I bit in subsequentmessages until the status register has been read.

Register Access Message Type

Callisto's registers are accessed by messages sent to it on its serialinterface. The message consists of a control byte, an address byte and 0to 4 data bytes. FIG. 50 shows the format of register access messages.For registers whose width is greater than a single byte, leastsignificant bytes will appear in the message first. Using the examplemessage in FIG. 50 as an example of writing to a 32 bit register, databyte 0 would be written to bits 7:0 of the register, data byte 1 to bits15:8, data byte 2 to bits 23:16 and data byte 3 to bits 31:24.

The following table shows the the control byte format for registeraccess messages: Field Bits Width Description T[2:0] 2:0 3 Type. Type ofmessage. Set to “000” for register access. N[1:0] 4:3 2 Number WriteBytes. Indicates the number of bytes of data to be written during aregister write, less one, where “00” indicates 1 byte and “11” indicates4 bytes. Set to “00” for read. W 5 1 Write. If this bit is set to ‘1’indicates a register write. Setting to ‘0’ indicates a read. E 6 1External. If set to ‘1’ indicates the register operation is for anexternal device, otherwise a Callisto register access. N/A 7 1 Not Used.Should be set to ‘0’.

Callisto Command Message Type

The user asks Callisto to perform its tasks by sending it messages whichspecify which operation to perform. These command messages consist of acontrol byte, followed by zero or one parameter byte-count bytes(pbcount), followed by a number of parameter bytes as specified bypbcount, or as implied by the command type. FIG. 51 shows the format forthe command message. pbcount is set to the number of parameter bytesless one, so a value of zero signifies that there will be one parameterbyte.

The following table shows the control byte format for Callisto commandmessages: Field Bits Width Description T[2:0] 2:0 3 Type. Type ofmessage. Set to “001” for Callisto command. C[1:0] 4:3 2 Command Type.Specifies the type command: “00” Unprocessed image read “01” Sub-pixelread “10” Sub-sampled image read “11” Processed image read P 5 1Parameter. When set to ‘1’ indicates that this command has itsparameters included in the message. Otherwise use parameters defined byCallisto register settings. N/A 7:6 2 Not Used. Should be set to “00”.

Number of pbcount bytes per command: Number of pbcount Command Typebytes Unprocessed image read 0 Processed image read 0 Sub-sampled imageread 0 Sub-pixel read 1

Register Data Message Type

These messages are sent from Callisto back to the processor, as a resultof a register read message being received by Callisto. The messageconsists of a control byte, a register address byte and up to four bytesof data. See FIG. 52. Using the example message in FIG. 52 as an exampleof reading from a 32 bit register, data byte 0 would be taken from bits7:0 of the register, data byte 1 from bits 15:8, data byte 2 from bits23:16 and data byte 3 from bits 31:24.

The following table shows the control byte format for register datamessages: Field Bits Width Description T[2:0] 2:0 3 Type. Type ofmessage. Set to “010” for register data. N[1:0] 4:3 2 Number Data Bytes.Indicates the number of bytes of data, less one, where “00” means 1 byteand “11” means 4 bytes. I 5 1 Interrupt. Indicates that some event hasoccurred which has changed the status register. An indicator thatsoftware should examine the status register contents. E 6 1 External. Ifset to ‘1’ indicates the original register read for an external device,otherwise a Callisto register access and set to ‘0’. N/A 7 1 Not Used.Should be set to ‘0’.

Command Data Message Type

-   I. These messages return data back to the processor as a result of    processing a command. The message comprises a control byte, two data    count bytes, followed by a number of data bytes as specified by the    data count bytes. See FIG. 53. The data count bytes specify how many    bytes of data are in the message, less one, so that a value of    0x0000 means that the message contains a single byte of data. Count    byte 0 is the least significant byte of the two bytes.-   II.-   III. The following table shows the control byte format for command    data messages:

IV. Field Bits Width Description T[2:0] 2:0 3 Type. Type of message. Setto “011” for image data message. C[1:0] 4:3 2 Command Type. Specifiesthe type command for which this is the data being returned: “00”Unprocessed image read “01” Sub-pixel read “10” Sub-sampled image read“11” Processed Image Read I 5 1 Interrupt. Indicates that some event hasoccurred which has changed the status register. An indicator thatsoftware should examine the status register contents. N/A 7:6 2 Notused. Should be set to “00”.

The command type field C indicates the type of command that was executedto produce the result data in the image data message. The interrupt Ifield indicates that some event has occurred during processing and thatthe contents of the status register should be examined.

Format of Command Data

Data returned in command data messages is always pixel data, i.e. pixelvalues. In the case of image region read commands, that pixel data isreturned in raster order. In the case of the sub-pixel read command thepixels are returned in the order in which their correspondingcoordinates were supplied. Except for the processed image region readcommand, all pixel data is 8 bit. In the case of the processed imageregion read command the pixel data returned is 1 bit and padded so thatstart of lines occur on byte boundaries.

The pixel values returned as a result of executing a processed imageread command are single bit values. These values are packed into bytesso that each byte contains 8 pixel values. Image line boundaries alwayscorrespond to byte boundaries, and in the case where the image width isnot a multiple of 8, the last byte of a line will be padded with adefined bit value so that the next line begins on a byte boundary. Thevalue of the padding bit is defined in the Callisto configurationregister. FIG. 54 shows how single bit pixel values are packed for animage that is 132×132 pixels wide. 132 bits requires 16 full bytes, and4 bits of a 17th byte. The diagram shows that the full image requires2244 bytes and that each of the 132 lines consists of 17 bytes. Pixelsare packed in raster order using the least significant bit first.

Frame Synchronisation Message Type

These messages are intended to be used for software frame processingsynchronisation. There are three different forms of this message, asshown in the following table: Frame Sync Frame Sync Type Message MessageType Code Source Comment Ready for new b′00 Processor Indicates toCallisto that the frame processor is ready to process a new frame.Callisto will send a “received new frame” message in response. Finishedframe b′01 Processor Indicates to Callisto that the processing processorhas finished processing the current frame when the current command hasfinished execution. This unlocks the frame buffer and allows new imagesensor frames to be written. Received new b′10 Callisto This is theresponse to the frame “ready for new frame” message and indicates thatCallisto has a new frame ready for processing.

Frame sync message - control byte format Field Bits Width DescriptionT[2:0] 2:0 3 Type. Type of message. Set to “100” for frame sync message.S[1:0] 4:3 2 Frame Sync Type. Indicates the type of frame sync message:“00” - Ready for new frame “01” - Finished frame processing “10” -Received new frame I 5 1 Interrupt. Indicates that some event hasoccurred which has changed the status register. An indicator thatsoftware should examine the status register contents. This bit onlyappears in messages from Callisto. i.e. when Frame Sync Type is “10”.N/A 7:6 2 Not used. Should be set to “00”.

Frame Store Write Message Type

This message type enables the user to write pixel data directly into theframe store buffer. To be able to perform this function the ‘WriteFrame’bit in the configuration register must be set first. This messageconsists of the 0x7E byte, a control byte and four bytes of pixel data,supplied in raster order. Frame store write message - control byteformat Field Bits Width Description T[2:0] 2:0 3 Type. Type of message.Set to “101” for frame store writes. S 3 1 Start of Frame. Setting thisbit indicates that the message contains the first byte of a new frame.N/A 7:4 4 Not Used. Set to b′000.

Frame Store Write Acknowledge Message Type

This message acknlowledges a frame store write message, notifying theuser that another frame store write message may be issued. The messageconsists of a 0x7E byte and a control byte. Frame store write message -control byte format Field Bits Width Description T[2:0] 2:0 3 Type. Typeof message. Set to “110” for frame store writes. ER 3 1 Error. This bitis set by Callisto when a FrameStoreWrite message was received but theconfiguration register bit WriteFrame was not set. N/A 4 1 Not Used. Setto b′0. I 5 1 Interrupt. indicates that some event has occurred whichhas changed the status register. An indicator that software shouldexamine the status register contents. N/A 7:6 2 Not Used. Set to b′00.

13. Callisto Commands

Callisto is able to perform four operations: unprocessed image read,processed image read, sub-sampled image read and sub-pixel read.

Commands are issued to Callisto by sending it command messages.Arguments or parameters for commands may be specified in one of twoways. The first is to set command-specific settings in the appropriateregister, as defined in the “Operation” chapter. The second method is tosupply the parameters with the command itself. In this case a slightlydifferent form of the command is used to indicate to Callisto that itshould use parameters supplied with the command and not from a registersetting.

Telling Callisto to use arguments supplied with the command rather thanthose specified in its registers is done by setting the P bit in thecommand message control byte to ‘1’. Overlapping command execution withcommand transmission is not supported; while Callisto is busy executinga command it cannot receive any new commands. The user should be carefulnot to issue a new command until the previous command has finishedexecution, indicated by the processor receiving the correspondingcommand data message. If commands are received while Callisto is busyexecuting a command it will enter an error state and indicate this tothe processor via the serial interface. See Section for details.

The following sections describe the individual commands and how toconstruct the command message to perform them.

Unprocessed Image Read

This command tells Callisto to return all of the pixel data within adefined region of the unprocessed image. This command doesn't requireany parameter count bytes following the control byte as it has a fixednumber of arguments. This command expects two arguments (expressed astwo bytes): TopLeftX, TopLeftY. An example message for this command isshown in FIG. 58.

The actual execution of this command relies on an additional twoparameters: SizeX and SizeY. These two parameters must be specified inthe appropriate register. Note that this command always expects twoarguments, and it is illegal not to have the P bit set. Different formsof unprocessed image read command: Has Control Parameters Byte ValueComments No b′00000001 Illegal form of this command. P bit must alwaysbe set and arguments supplied. Yes b′00100001 Valid form of thiscommand.

Processed Image Read

This command tells Callisto to return all the pixel values in thedefined region of the processed image. This command requires fourarguments (expressed in four bytes) if supplied: TopLeftX, TopLeftY,SizeX and SizeY. The size parameters are in processed image units, andTopLeftX and TopLeftY are expressed in processed image coordinates. Thiscommand returns pixel values from the processed image aftersub-sampling, dynamic range expansion and thresholding, so all pixelsare single bit values. FIGS. 59 a and 59 b show two example formats ofthis command. Different forms of processed image read command HasControl Parameters Byte Value Comments No b′00011001 Size and TopLeftarguments taken from Callisto register. Yes b′00111001 Size and TopLeftarguments supplied with command.

Sub-Sampled Image Read

This command is identical to the processed image read command exceptthat the processed image in this case has not had dynamic rangeexpansion and thresholding performed. This means that the pixelsreturned are 8 bit values. Everything else about this command is thesame. FIGS. 60 a and 60 b show two example formats for this command.Different forms of sub-sampled image read command Has Control ParametersByte Value Comments No b′00010001 Size and TopLeft arguments taken fromCallisto register. Yes b′00110001 Size and TopLeft arguments suppliedwith command.

Sub-Pixel Read

This command tells Callisto to calculate the sub-pixel values at thespecified sub-pixel coordinates. This command has only one form and itsarguments must always be supplied in the command message. This commandhas one pbcount byte following the control byte which indicates how manycoordinate bytes are contained in the message. pbcount defines thenumber of coordinate bytes less one—i.e two (b'00000010) means 3bytes—and must represent a number of bytes that is divisible by 3. FIG.61 shows the format for a sub-pixel read command with 8 sub-pixelcoordinates. Different forms of sub-pixel read command Has ControlParameters Byte Value Comments No b′00001001 Illegal form of command.Must have arguments supplied. Yes b′00101001 Valid form of command.

Callisto Command Processing

The commands processed by Callisto are embedded in messages input usingthe serial interface. In normal circumstances Callisto processescommands immediately upon receipt using whatever image data is in itsframe store buffer at the time. There are however some boundaryconditions that cause Callisto to not follow this “normal” behaviour.These conditions occur at frame boundaries.

Initially, after reset, the frame store buffer will be empty, andCallisto will be disabled and will not process received commands. OnceCallisto is enabled, and when the frame store buffer contains a completeframe, command execution begins and further writing to the frame storebuffer is disabled. This condition continues until Callisto receives afinished frame processing message. This indicates that processing of thecurrent frame has finished. At this point the frame store buffer isunlocked, and command execution locked until the next frame window iswritten into the buffer. FIG. 62 shows the state transitions and statesfor command execution and frame store writing.

Frame Store Buffer

The frame store buffer is where image data from the sensor is storedwhile Callisto is performing image processing operations on that data.The frame store buffer is considered to be either “locked” or“unlocked”. In its unlocked state, the frame store buffer is able toaccept image data from the image sensor, while in its locked state it isnot (see FIG. 62 above). The frame store buffer becomes locked when thecurrently defined sensor window is completely written into the buffer,and not when all the data from the image sensor has been received. FIG.63 shows when the buffer is locked.

Issuing Callisto Requests

For requests that return data, i.e. Callisto commands, register readsand ready to receive a new frame, the processor may only have a singlerequest outstanding at any one time; the processor must wait until ithas received the data output of the current request before issuing a newrequest.

For requests that do not return any data, e.g. register writes, theprocessor does not have to wait and may issue these requests at whateverrate it wishes.

Callisto is unable to honour a command request if its frame store bufferis not full, as this will result in an image data underflow error.Callisto can process register access requests and frame synchronisationrequests when the buffer is not full.

Command Execution Performance

Output Data Rates

For all commands except sub-pixel read, the output data as a result ofexecuting a command is produced without interruption at the full serialinterface rate. In the case of the sub-pixel read command, the sub-pixelvalues returned as a result of command execution is produced withoutinterruption at one third the full serial interface rate. The reason forthis is that the calculation of each sub-pixel byte value requires athree-byte coordinate value; Callisto must wait for the full coordinateto be received before it can calculate the single-byte result.

The exception to the above is the case of a processed image andsub-sampled image read commands when the regions used are small. In thiscase the output data rate falls below 100% of the full serial interfacedata rate. Table shows the output data rate for region widths less than10 pixels, and heights less than 8 pixels. expressed as a percentage ofthe full serial data rate. Data output rates for small region sizesOutput Data Region Width Region Height Rate 0-9 8+ 50%-60% 10+ 0-745%-50% 0-9 0-7 20%

Latency

The table below shows execution latencies for each command expressed innumber of serial clock cycles. Latency times are measured from thereceipt of the start bit for the first byte of the message that containsthe command, to the transmission of the start bit for the first byte ofthe message that contains the command response. Command latenciesExecution Command Latency Image read (without parameters) 30-40 clocksImage read (with parameters) 50-70 clocks Register read 30-40 clocksReceive new frame 25-30 clocks

Error Detection and Recovery

When Callisto is active, and executing commands, there are severalevents that it will consider to be errors. If any of these events occur,Callisto ceases command execution, initiate a break condition on theserial interface to indicate to the processor that an error hasoccurred, and will not be able to resume normal operation until theerror recovery cycle is complete. FIG. 64 shows the error recoverycycle. The events that put Callisto into an error state are shown in thefollowing table: Callisto error conditions Error Condition CommentsMessage out of sync This condition occurs when Callisto is no longerable to determine where messages begin and end. Malformed message When aCallisto command is malformed. An example of this may be when Callistois expecting command arguments and none were supplied. Definition ofmalformed messages: 1. All messages: (a) illegal message type. 2.Register Access Messages: (a) a read access and num_write_bytes /= “00”.(b) not_used field /= ‘0’. (c) illegal internal register address value.(d) illegal external register address value. (d) internal access,num_write_bytes inconsistent with address 3. Image Command Messages: (a)not_used field /= “00”. (b) unprocessed read with P /= ‘1’. (c) subpixelread with P /= ‘1’. (d) subpixel read where (pbcount+1) not divisible by3. 4. Frame Sync Messages: (a) illegal control byte type. (b) interruptbit /= ‘0’. (c) not_used field /= “00”. 5. Frame Store Write Messages:(a) not_used field /= “000” Malformed byte Occurs when a stop bit is notfound in the correct position. Command overflow This condition occurswhen Callisto is busy processing a message which produces a response andreceives a new message requiring a response. Image data underflowCallisto receives a command but the frame store buffer doesn't contain acomplete frame, i.e. isn't locked.

Image Sensor Interface

Data Interface

The interface to the image sensor relies on external control of imagesensor timing, i.e. Callisto does not control the image sensor timing orsequencing. Callisto relies on the image sensor interface telling itwhen there is a new frame to be read from the sensor, and then relies onthe interface telling it when there is valid pixel data. See the“Interfaces” chapter for timing details.

Two parameters affect how the image sensor interface behaves: the ImageSensor Window setting, and the Image Sensor Size setting. Both theseparameters are located in Callisto registers.

The Image Sensor Window setting controls which part of the total imagedata Callisto is to write to its frame store buffer. Data outside thiswindow is ignored by Callisto, i.e. not written to the frame storebuffer.

The Image Sensor Size setting tells Callisto the size of the imagesensor array, and so how much data to expect in a frame. This parameteris needed in conjunction with the window setting in order to work outwhat data to save and which data to ignore.

Timing Interface

Callisto provides two signals, and possibly a third to control the imagesensor to which it is connected and an external flash. The two outputsignals are expose and flash. A third signal, capture, can either begenerated by Callisto and used internally or provided as an input. Thetimings of expose and flash are defined relative to capture and aredefined by the delay from the rising edge of capture as well as how longeach signal is asserted. The timings of these two signals may be definedindependently of each other.

All of Callisto's image sensor timing signals are inactive wheneverCallisto is inactive, i.e. when the Enable bit is the configurationregister is set to ‘0’.

When Callisto is configured to generate the timing for the capturesignal internally, the user defines the period of the capture signal,defining the length of time between pulses. The first capture pulse isgenerated immediately after the enable bit is set in the configurationregister.

External Register Interface

Callisto may be used to control the reading from, and writing toregisters in other devices. To this end Callisto provides a genericregister read/write bus that allows it to gain access to registers inother devices. Register access commands used on Callisto's serialinterface allow the user to specify whether a register operation is“internal” or “external.” Internal register accesses are used to accessCallisto registers, and external accesses are used to gain access toregisters in the external device, and initiate transactions on theexternal register interface.

This interface is asynchronous and expects the external device toobserve a handshaking protocol.

Power Management

Callisto has a low power mode where the serial interface and externalimage sensor timing signals remain active. In this mode the user is ableto access Callisto registers. This low power mode can be entered in oneof two ways. The first is to set the LowPower bit in the configurationregister. When this occurs Callisto will remain in low power mode untilthe LowPower bit is cleared.

The second way Callisto enters its low power mode occurs when theAutoSleep bit in the configuration register is set. In this case lowpower mode will be entered when Callisto becomes inactive, and willleave this state when there is some activity for it to perform.

The “inactive” state is entered when Callisto has finished processingthe current frame, which corresponds to having received the “finishedframe processing” message.

The “active” state is entered when Callisto has received indication,from the image sensor, that a new frame is available. This occurs whenthe isync signal is asserted.

Callisto Interfaces

PINOUT

The following table shows all input and output signals on Callisto.General control interface signals: Signal name Width DescriptionDirection resetb 1 Asynchronous system reset. input ten 1 Test enable.input tmode 1 Test mode input sen 1 Scan enable. input sclk 1 Serialclock. input txd/sout 1 Serial output data or scan output output data.rxd/sin 1 Serial inout data or scan input input data. iclk 1 Imagesensor clock. input isync 1 Image sensor frame synch. input ivalid 1Image sensor pixel valid. input idata 8 Image sensor pixel data. inputcapture 1 Input version of image sensor input capture/flash timingrefernce signal. This signal may also be (optionally) internallygenerated. flash 1 External flash control signal output expose 1 Imagesensor exposure control output signal rvalid 1 Register interface valid.output rwr 1 Register interface write. output raddr 8 Register interfaceaddress. output rdatai 32 Register interface input data. input rdatao 32Register interface output data. output rack 1 Register interfaceacknowledgment input rnak 1 Register interface negative inputacknowledgment TOTAL 96

GENERAL CONTROL AND TEST INTERFACE General control and test interfacesignals Signal name Description Direction resetb System reset. Activewhen driven input low. Asynchronous to main system clock sclk. ten Testenable. When driven high input enables image data to serial datatesting. tmode Test mode. When driven high puts input Callisto into testmode, specifically for scan testing and BIST. sen Scan enable. Whendriven high input scan testing is enabled. In this mode the serialinterface data signals txd and rxd become scan data signals. In thismode sclk is used as the scan clock. sin Scan input data. Multiplexedwith input the serial data input signal rxd when sen = ‘1’. sout Scanoutput data. Multiplexed with output the serial data output signal txdwhen sen = ‘1’.

FIG. 65 shows Callisto's reset timing. resetb must be held low for atleast 3 cycles of the slowest of the two clocks, sclk and iclk.

Test Mode Definitions

ten—Test Enable. When Asserted:

Forces idata to be serialized and output from txd (see section 3.4).

Ignore all commands/accesses except for register writes.

sen—Scan Enable. When Asserted:

Forces every flip-flop in the design into one large shift register

tmode—Test Mode. When Asserted:

Forces all derived clocks to be sourced from sclk.

Forces an xor-based bypass of RAM I/O. Ouputs of RAMs are wired to theRAM inputs through an xor structure so that RAM outputs can becontrolled during scan.

Forces async reset trees to be controlled via reset pin (i.e. bypassingsynchronization). Reset is synchronised to target clock domain duringnormal operation, but this must be disabled during scan as these resetsync flip-flops are also in the scan chain. If this bypassing didn'toccur the global synchronised reset signals may accidentally betriggered during scan. Test pin settings Device Mode sen tmode tenFunctional 0 0 0 Image data to serial 0 0 1 Scan testing 0/1 1 0 BISTtesting 0 1 0

IMAGE SENSOR DATA INTERFACE Image sensor interface signals Signal nameDescription Direction iclk Image sensor interface clock. input Maximumfrequency is 50 MHz. Note: iclk must always be running, isync Imagesensor sync. Indicates the input image sensor has captured a new frame.ivalid Image sensor data valid. When input high, indicates valid data inidata bus. Goes high after isync is asserted. idata[7:0] Image sensordata. Byte-wise data input from image sensor. Valid when ivalid isasserted.

FIG. 66 shows the timing for the image sensor interface. isync isasserted to indicate that the image sensor has captured a new frame.ivalid is asserted to indicate that valid pixel data is now available onidata. ivalid is asserted for each iclk cycle during which there isvalid pixel data on idata. isync must be high for at least one clockcycle and may stay high for the entire frame transfer.

IMAGE SENSOR TIMING INTERFACE Image sensor interface signals Signal nameDescription Direction capture Image sensor capture and flash inputtiming reference signal. flash Control the flash. output expose Controlsframe capture for the output image sensor.

FIG. 67 shows the timings for image sensor control signals. All of thetime parameters are in units of iclk clock cycles, and are defined bysetting their values in the appropriate Callisto register. The parametert1 is only definable when capture is an internal signal. The capturesignal is synchronous to iclk and has a pulse width of 1 iclk period.

FIG. 68 shows the timing for the external capture signal, which must beasserted for at least 1 iclk cycle when active.

SERIAL INTERFACE Serial interface signals Signal name DescriptionDirection sclk Serial clock. Maximum frequency is 40 MHz. input txdTransmit data output rxd Receive data input

FIGS. 69 and 70 show the operation of the serial interface insynchronous mode. Shown here is a back-to-back transfer of 2 bytes fromCallisto to the microprocessor on txd using a single stop bit. Alsoshown is the transfer of a byte from the microprocessor to Callisto onrxd, also using a single stop bit.

Error Recovery Timing Using Break

FIG. 71 shows the timing for error recovery. When Callisto encounters anerror, it signals this condition by holding the txd signal low (for atleast 10 sclk cycles). This will violate the ‘0’ start bit, ‘1’ stop bitrequirement and will raise a microprocessor interrupt. This is the breakcondition. Once the microprocessor detects the break it will then alsogenerate a break condition on rxd. Callisto acknowledges this by drivingtxd high, and the process is completed by the microprocessor driving rxdhigh.

EXTERNAL REGISTER INTERFACE External register interface signals Signalname Description Direction rvalid Register bus valid. High wheneveroutput a read or write operation is occurring. Validates raddr andrdatao. rwr Register bus write. When high output indicates the currentoperation is a register write. rack Register bus ack. Signals to inputCallisto end of register access cycle. rnak Register bus negative ack.Has input same behavior as rack in that it is a handshaking signal toend a transaction. It is asserted instead of rack to indicate that anerror has occurred during the transaction, and that it could not becarried out, raddr[7:0] Register bus address. Indicates the outputaddress of the register being accessed. rdatai[31:0] Register bus datain. Data bus input driven by slave device. Used for register reads.rdatao[31:0] Register bus data out. Data to be output written to aregister during a write, when rwr is high.

FIG. 72 shows the timing for a read cycle on the external registerinterface. The read cycle begins by validating the address (raddr) bydriving rvalid high, together with driving rwr low. The target deviceacknowledges that is has put the addressed data onto rdatai by drivingrack high. rack then remains high until Callisto drives rvalid lowagain. This signals the end of the transaction.

FIG. 73 shows the timing for an external register write. Callistosignals the start of the cycle by validating the address and data to bewritten (raddr and rdatao) by driving rvalid high, together with drivingrwr high. The target device acknowledges the write by driving rack high.rack then remains high until Callisto drives rvalid low again. Thissignals the end of the transaction. If the rnak signal is asserted tocomplete a transaction that means there was an error in the externaldevice and the transaction could not be completed successfully.

Note that either rack or rnak should be asserted, and not bothsimultaneously.

Operation

REGISTERS

This section describes Callisto's registers.

Configuration Register

This is a general Callisto configuration register. ConfigurationRegister - 8 bit Reset Field Width Bits Value Description Enable 1 0 b′0Enable. Setting this bit to ‘1’ enables Callisto operation. Callistowill perform no command processing or frame store writing while this bitis set to ‘0’, but will still respond to register accesses. ComExRst 1 1b′0 Command Execution Restart. When set to ‘1’ causes Callisto toimmediately stop command processing and return to its inital processingstate. This bit is self clearing. PadBit 1 2 b′0 Padding Bit. Value touse when padding bytes as a result of reading a full processed image.The padding is used to align the start of image lines with byteboundaries. BistStart 1 3 b′0 BIST Start. Instructs Callisto to performBIST testing of its RAMs. This bit is self clearing. CaptureIn 1 4 b′0Capture Input. When set to ‘1’ the capture signal is suppliedexternally, otherwise it is internally generated. LowPower 1 5 b′0 LowPower Mode. When this bit is set to ‘1’ Callisto enters its low powerstate. AutoSleep 1 6 b′0 Auto Sleep and Wakeup. When this bit is set to‘1’ Callisto will automatically enter its low power state when inactive,and return to its normal state when active again. WriteFrame 1 7 b′0Write Frame. Setting this bit to ‘1’ enables direct writing to the framestore buffer.

Status Register

Callisto status register. This register is clear on read (COR). StatusRegister - 16 bit Reset Field Type Width Bits Value Description ErrCondCOR 3 2:0 b′000 Last Error Condition. Indicates the error that occurredthat put Callisto into an error state. “000” - No error “001” - Messageout of sync “010” - Malformed message “011” - Malformed byte “100” -Command overflow “101” - Image data underflow FrameMiss COR 2 4:3 b′00Missed Frames. Indicates that new frames were available to be writteninto the frame store buffer but Callisto was unable to do so because wasin the command execution state. “00” - No frames missed “01” - One framemissed “10” - Two frames missed “11” - Three or more frames missed.BistFail COR 6 10:5  0x0 BIST Failure. Result of running built in selftest on 4 internal RAMs. ‘0’ - BIST passed ‘1’ - BIST failed Bitallocation: 0 - Frame Store Buffer 1 1 - Frame Store Buffer 2 2 -Sub-sample Buffer 1, RAM 1 3 - Sub-sample Buffer 1, RAM2 4 - Sub-sampleBuffer 2, RAM 1 5 - Sub-sample Buffer 2, RAM 2 BistComplete COR 1 11 b′0Bist Complete. When ‘1’ indicates that BIST has completed. AutoSleepStat1 12 b′0 Auto Sleep Status. When ‘1’ indicates that Callisto is in itslow power state. N/A 3 15:13 Not Used.

Threshold Register - 8 bit Reset Field Width Bits Value DescriptionThreshold 8 7:0 0x00 Threshold value used in dynamic range expansion andthresholding process. Expressed as t/255 where t is the desiredthreshold level. Represented as a 0.8 fixed- point value.

Unprocessed Image Size Register

This register is used to defined the size of the region used in theunprocessed image read command. Unprocessed Image Region Register - 16bit Reset Field Width Bits Value Description SizeX 8  7:0 0x00 Size - 1of region in X direction. SizeY 8 15:8 0x00 Size - 1 of region in Ydirection.

Processed Image Region Register

Defines the rectangular region to be used in the full processed imageread command, and the sub-sampled image read command. Image Region SizeRegister - 32 bit Reset Field Width Bits Value Description TopLeftX 87:0 0x00 X coordinate of top left hand corner of region. TopLeftY 815:8  0x00 Y coordinate of top left hand corner of region. SizeX 8 23:160x00 Size - 1 of region in X direction. SizeY 8 31:24 0x00 Size - 1 ofregion in Y direction.

Image Sensor Window Register

This register defines the window used across the image sensor interface.Data outside of the defined window is “dropped,” and not written intothe frame store buffer. Image Sensor Window Register - 32 bit ResetField Width Bits Value Description TopLeftX 8 7:0 0x00 X coordinate oftop left hand corner of window. TopLeftY 8 15:8  0x00 Y coordinate oftop left hand corner of window. SizeX 8 23:16 0x00 Size - 1 of window inX direction. SizeY 8 31:24 0x00 Size - 1 of window in Y direction.

Image Sensor Size Register - 16 bit Reset Field Width Bits ValueDescription SizeX 8  7:0 0x00 Size - 1 of image sensor in X direction.SizeY 8 15:8 0x00 Size - 1 of image sensor in Y direction.

Capture Period Register - 24 bit Reset Field Width Bits ValueDescription CapturePeriod 24 23:0 0x00 Defines the period of the capturesignal in number of iclk cycles (t1). If set to zero then capture cycleis disabled.

Expose Timing Register - 32 bit Reset Field Width Bits Value DescriptionDelay 16 15:0  0x00 Defines the delay (minus one) after capture beforeexpose signal is asserted, in number of iclk cycles (t2). HighTime 1631:16 0x00 Defines how long (minus one) expose is asserted, in iclkcycles (t3).

Flash Timing Register - 32 bit Reset Field Width Bits Value DescriptionDelay 16 15:0  0x00 Defines the delay (minus one) after capture beforeflash signal is asserted, in number of iclk cycles (t4). HighTime 1631:16 0x00 Defines how long (minus one) flash is asserted, in iclkcycles (t5).

Chip ID Register - 8 bit Reset Field Width Bits Value DescriptionRamWidth 8  7:0 TBD1 RAM Width. Identifies the width (minus 1, in bytes)of the frame store buffer. BuffMode 1 8 TBD2 Buffering Mode. This bitindicates whether the design uses single or double buffering: 0 - SingleBuffering 1 - Double Buffering Id 7 15:9 0x00 Chip Identifier.Identifies the design. Calliso's value is 0x00.1 RamWidth value is defined when the chip is manufactured, as isreadable on reset.2 BuffMode value is defined when the chip is manufactured, as isreadable on reset.

INITIALISATION

After reset, Callisto is in a state where all of its configurationregisters contain their reset values defined above; Callisto isdisabled, making it unable to perform any image processing. It is notuntil the Enable bit in the configuration register is set to ‘1’ afterreset, by a register write, that Callisto begins performing any of itsfunctions.

Before enabling Callisto by setting the Enable bit, any other fixedparameters should be set also.

While Callisto is disabled, i.e. Enable is set to ‘0’, Callisto does notprocess any commands or write image sensor data into its frame store,and only responds to register access messages.

NORMAL OPERATION

During normal operation Callisto is notified of new frames captured bythe image sensor. These frames are written into Callisto's frame storebuffer. The timing and triggering of image capture by the sensor isoutside of Callisto's control. It is simply told when new frames areavailable.

Once a captured image has been written to the frame store buffer, theuser may ask Callisto to perform commands. This is done by sendingCallisto a command message. Parameters for commands may be supplied withthe command, in the message, or may be taken from a command-specificregister. This second option saves the user having to keep definingparameters when they are issuing multiple commands with the samearguments. When parameters are sent with the command they are notpersistently stored, i.e. they do not get written into thecommand-specific registers. Only an explicit register write can do this.

For commands that have long sequences of parameters, like the sub-pixelread command, the arguments are used as they arrive. Results aregenerated immediately, meaning that the results of a sub-pixel readcommand may start appearing on the serial interface before all theparameters (sub-pixel coordinates) have been received.

Frame Processing

The following pseudo code fragment highlights the steps involved inprocessing each frame. This code would be executed on the processor atthe other end of the serial interface. while TRUE loopsendMsg(readyForNewFrame); waitMsg(receivedNewFrame);processImage(frame); sendMsg(finishedProcessingFrame); end loop;

Message Abutment

Commands that do not return any data immediately, such as registerwrites, may be positioned immediately after another command without theneed for that command to have finished execution. Any command may bepositioned immediately after another command which doesn't return anydata. This section contains some pseudo-code segments to demonstratethis.

Normally, a command must finish execution before the next command can besent: sendMsg(unprocessedImageRead); // must wait for command executionto finish waitMsg(unprocessedImageReadData); registerRead.address =0x01; sendMsg(registerRead);

In this example, the code waits for the response of theunprocessedimageRead command before sending a request to execute aregisterRead command.

Register Writes

Register writes take effect immediately after the message is received byCallisto so care must be taken to ensure that the write does notadversely affect any command in progress.

If a register write immediately follows another command there is no needto wait for its response: sendMsg(unprocessedImageRead); // no need towait for command execution to finish registerWrite.address = 0x03;registerWrite.data = 0xff registerWrite.length = 1;sendMsg(registerWrite);

Frame Synchronisation

The FinishedFrameProcessing message does not generate a response so canbe abutted against another command, typically the final command inprocessing a frame. subPixelRead.xCoord[0] = 1.5; subPixelRead.yCoord[0]= 2.75; subPixelRead.xCoord[1] = 3.75; subPixelRead.yCoord[1] = 3.5;subPixelRead.xCoord[2] = 12.25; subPixelRead.yCoord[2] = 27.75;subPixelRead.numCoords = 3; sendMsg(subPixelRead); // last processingcommand for current frame // No need to waitsendMsg(finishedFrameProcessing); // Now must wait for sub-pixel databefore ready for a new frame waitMsg(subPixelReadData); // Signal thatwe are ready to process a new frame sendMsg(readyForNewFrame);waitMsg(receivedNewFrame); // Processing new frame can now begin . . .

WRITING DIRECTLY TO FRAME STORE BUFFER

During normal operation, data going into the frame store buffer comesfrom an image sensor on the image sensor interface. Callisto has a modewhich allows the user to write directly to the frame store buffer. Theexample below shows writing two 10×10 frames into the frame storebuffer.

When switching to direct frame store writing mode it is recommended thatthe following sequence of operations be used:

Reset Callisto;

Set WriteFrame bit in config register;

Set Enable bit in config register;

Begin writing to frame store. configRegister = 0x00;registerWrite.address = configRegister; registerWrite.data[8] = 1; //set WriteFrame bit sendMsg(registerWrite); frameStoreWriteMsg.first = 1;// This is the first write of a frame frameStoreWriteMsg.data = data[0];sendMsg(frameStoreWriteMsg); // Wait for the responsewaitMsg(frameStoreWriteResp); frameStoreWriteMsg.first = 0; // This isNOT the first write of a frame frameStoreWriteMsg.data = data[1];sendMsg(frameStoreWriteMsg); // Wait for the responsewaitMsg(frameStoreWriteResp); frameStoreWriteMsg.data = data[2];sendMsg(frameStoreWriteMsg); // Wait for the responsewaitMsg(frameStoreWriteResp); . . . // last word of the frameframeStoreWriteMsg.data = data[24]; sendMsg(frameStoreWriteMsg); // Waitfor the response waitMsg(frameStoreWriteResp); . . . // Write a newframe into frame store buffer frameStoreWriteMsg.first = 1; // This isthe first write of a frame frameStoreWriteMsg.data = data[0];sendMsg(frameStoreWriteMsg); // Wait for the responsewaitMsg(frameStoreWriteResp); frameStoreWriteMsg.first = 0; // This isNOT the first write of a frame frameStoreWriteMsg.data = data[1];sendMsg(frameStoreWriteMsg); // Wait for the responsewaitMsg(frameStoreWriteResp); frameStoreWriteMsg.data = data[2];sendMsg(frameStoreWriteMsg); // Wait for the responsewaitMsg(frameStoreWriteResp); . . . // last word of the frameframeStoreWriteMsg.data = data[24]; sendMsg(frameStoreWriteMsg); // Waitfor the response waitMsg(frameStoreWriteResp);Callisto Design

ARCHITECTURAL OVERVIEW

The architectural partitioning of the Callisto design is illustrated inFIG. 74.

Callisto Top-Level Partitioning

The serialif block performs all message reception, interpretation andtransmission. Image command and register accesses received from the userare translated into single command instructions which are sent to theimproc and config blocks. Subpixel image commands become a series ofinstructions, one for each coordinate pair. When a message is receivedthat requires a response (image read or register read) the serialinterface starts transmitting the message header. The improc and configblocks wait before outputing data to the serial interface to ensure thesuccessful transmission of returning message header.

The config block contains all the configuration registers and theinterface to the external registers. Register instructions are receivedfrom the serialif block and read data is returned as a rate adapted (atthe serial interface bandwidth) byte stream.

The improc block controls the image reading functions. It receives acommand instruction from the serialif block and performs SRAM reads fromeither the subsambufs or framebufs blocks. For subpixel and processedread commands, this data is processed before being passed to theserialif block. For unprocessed and subsampled reads, the raw RAM datais sent to the serialif block. The output data is a rate adapted bytestream.

The framebufs block provides double buffered storage for the raw imagedata. Bytes are written into the frame store buffer from the imgsensifblock, and bytes are read by the imgproc block.

The subsambufs block provides double buffered storage for the subsampledimage data, which is derived from the incoming image sensor interface.The loading of subsampled data by the imgsensif block involves aread-modify-write operation. This is due not only to the subsambuf wordsize (70 bits), but also the subsampled value calculation sequence. Thewide word size is required to maximize txd utilization during aprocessed image read. The imgproc block reads from the subsambufs blockwhilst executing either a subsampled image read or processed image read.

The imgsensif block receives data from the image sensor interface andcontrols the writing into both the framebufs and subsambufs blocks. Itmanages the double-buffering swapping mechanism, image windowing and theimage data subsampling calculations. Rate adapted image sensor data ispassed directly to the serialif during test mode (ten).

The clk_driver block controls the generation of all internal clocks.s_clk and i_clk are the persistent clocks for the serial and imagedomains respectively. sq_clk and iq_clk are their low-power equivalentsand are disabled whenever possible. For the double buffered design,rq_clk[1:0] are the clocks controlling the two swapping SRAM buffers andare also disabled whenever possible. The single buffered design has asingle rq_clk[0].

The synch block synchronizes signals crossing the iclk/sclk boundary.

The flash_expose block generates the image sensor timing interfacesignals flash and expose.

HIERARCHICAL DESCRIPTION

The Callisto design hierarchies for the two different buffering schemes(single and double) are shown below. Each element in the hierarchy isdescribed in the form:<instance_name>:<block_name>(<block_architecture>). callisto_sb:callisto core_0: core(struct) clk_driver_0: clk_driver(rtl) config_0:config(rtl) flash_expose_0: flash_expose(rtl) framebufs_0:framebufs(rtl) framebuf_0: framebuf(rtl) fs_ram_bist_0:fs_ram_bist(struct) fs_ram_0: fs_ram(struct) fs_asic_ram_0:fs_asic_ram(behav) rambist_0: rambist(struct) bist_pattern0:bist_pattern(struct) bist_cmp0: bist_cmp(rtl) bist_fifo0:bist_fifo(struct) bist_fifow0: bist_fifow(rtl) cfgfifo0: cfgfifo(rtl)bist_seq0: bist_seq(rtl) imgproc_0: imgproc(struct) imgproc_fs_0:imgproc_fs(fsm) imgproc_sertim_0: imgproc_sertim(fsm) imgproc_ss_0:imgproc_ss(struct_rtl) imgsensif_0: imgsensif(struct) sens_ctrl_0:sens_ctrl(onebuf) sens_fs_0: sens_fs(rtl) sens_mux_0:sens_mux(struct_rtl) sens_ss_0: sens_ss(rtl) serialif_0:serialif(struct) sif_errhand_0: sif_errhand(rtl) sif_msghand_0:sif_msghand(rtl) sif_msghdrgen_0: sif_msghdrgen(rtl) sif_msgsync_0:sif_msgsync(rtl) sif_par2ser_0: sif_par2ser(rtl) sif_ser2par_0:sif_ser2par(rtl) subsambufs_0: subsambufs(rtl) subsambuf_0:subsambuf(rtl) ss_ram_bist_lo: ss_ram_bist(struct) rambist_0:rambist(struct) bist_pattern0: bist_pattern(struct) bist_cmp0:bist_cmp(rtl) bist_fifo0: bist_fifo(struct) bist_fifow0: bist_fifow(rtl)cfgfifo0: cfgfifo(rtl) bist_seq0: bist_seq(rtl) ss_ram_0: ss_ram(struct)ss_asic_ram_0: ss_asic_ram(behav) ss_ram_bist_hi: ss_ram_bist(struct)rambist_0: rambist(struct) bist_pattern0: bist_pattern(struct)bist_cmp0: bist_cmp(rtl) bist_fifo0: bist_fifo(struct) bist_fifow0:bist_fifow(rtl) cfgfifo0: cfgfifo(rtl) bist_seq0: bist_seq(rtl)ss_ram_0: ss_ram(struct) ss_asic_ram_0: ss_asic_ram(behav) synch_0:synch(struct) reset_sync_s1: reset_sync(rtl) reset_sync_i1:reset_sync(rtl) sig_pulse_sync_new_frame: sig_pulse_sync(rtl)sig_pulse_sync_frame_missed: sig_pulse_sync(rtl) sig_pulse_fin_frm_proc:sig_pulse_sync(rtl) sig_pulse_fsw_ack: sig_pulse_sync(rtl)sig_pulse_img_cmd_fs_wr: sig_pulse_sync(rtl)synchronizer_auto_lo_pwr_status: synchronizer(rtl) synchronizer_rack:synchronizer(rtl) synchronizer_rnack: synchronizer(rtl)synchronizer_img_en: synchronizer(rtl) synchronizer_auto_sleep:synchronizer(rtl) callisto_db: callisto core_0: core(struct)clk_driver_0: clk_driver(rtl) config_0: config(rtl) flash_expose_0:flash_expose(rtl) framebufs_0: framebufs(rtl) framebuf_0: framebuf(rtl)fs_ram_bist_0: fs_ram_bist(struct) fs_ram_0: fs_ram(struct)fs_asic_ram_0: fs_asic_ram(behav) rambist_0: rambist(struct)bist_pattern0: bist_pattern(struct) bist_cmp0: bist_cmp(rtl) bist_fifo0:bist_fifo(struct) bist_fifow0: bist_fifow(rtl) cfgfifo0: cfgfifo(rtl)bist_seq0: bist_seq(rtl) framebuf_1: framebuf(rtl) fs_ram_bist_0:fs_ram_bist(struct) fs_ram_0: fs_ram(struct) fs_asic_ram_0:fs_asic_ram(behav) rambist_0: rambist(struct) bist_pattern0:bist_pattern(struct) bist_cmp0: bist_cmp(rtl) bist_fifo0:bist_fifo(struct) bist_fifow0: bist_fifow(rtl) cfgfifo0: cfgfifo(rtl)bist_seq0: bist_seq(rtl) imgproc_0: imgproc(struct) imgproc_fs_0:imgproc_fs(fsm) imgproc_sertim_0: imgproc_sertim(fsm) imgproc_ss_0:imgproc_ss(struct_rtl) imgsensif_0: imgsensif(struct) sens_ctrl_0:sens_ctrl(fsm) sens_fs_0: sens_fs(rtl) sens_mux_0: sens_mux(struct_rtl)sens_ss_0: sens_ss(rtl) serialif_0: serialif(struct) sif_errhand_0:sif_errhand(rtl) sif_msghand_0: sif_msghand(rtl) sif_msghdrgen_0:sif_msghdrgen(rtl) sif_msgsync_0: sif_msgsync(rtl) sif_par2ser_0:sif_par2ser(rtl) sif_ser2par_0: sif_ser2par(rtl) subsambufs_0:subsambufs(rtl) subsambuf_0: subsambuf(rtl) ss_ram_bist_lo:ss_ram_bist(struct) rambist_0: rambist(struct) bist_pattern0:bist_pattern(struct) bist_cmp0: bist_cmp(rtl) bist_fifo0:bist_fifo(struct) bist_fifow0: bist_fifow(rtl) cfgfifo0: cfgfifo(rtl)bist_seq0: bist_seq(rtl) ss_ram_0: ss_ram(struct) ss_asic_ram_0:ss_asic_ram(behav) ss_ram_bist_hi: ss_ram_bist(struct) rambist_0:rambist(struct) bist_pattern0: bist_pattern(struct) bist_cmp0:bist_cmp(rtl) bist_fifo0: bist_fifo(struct) bist_fifow0: bist_fifow(rtl)cfgfifo0: cfgfifo(rtl) bist_seq0: bist_seq(rtl) ss_ram_0: ss_ram(struct)ss_asic_ram_0: ss_asic_ram(behav) subsambuf_1: subsambuf(rtl)ss_ram_bist_lo: ss_ram_bist(struct) rambist_0: rambist(struct)bist_pattern0: bist_pattern(struct) bist_cmp0: bist_cmp(rtl) bist_fifo0:bist_fifo(struct) bist_fifow0: bist_fifow(rtl) cfgfifo0: cfgfifo(rtl)bist_seq0: bist_seq(rtl) ss_ram_0: ss_ram(struct) ss_asic_ram_0:ss_asic_ram(behav) ss_ram_bist_hi: ss_ram_bist(struct) rambist_0:rambist(struct) bist_pattern0: bist_pattern(struct) bist_cmp0:bist_cmp(rtl) bist_fifo0: bist_fifo(struct) bist_fifow0: bist_fifow(rtl)cfgfifo0: cfgfifo(rtl) bist_seq0: bist_seq(rtl) ss_ram_0: ss_ram(struct)ss_asic_ram_0: ss_asic_ram(behav) synch_0: synch(struct) reset_sync_s1:reset_sync(rtl) reset_sync_i1: reset_sync(rtl) sig_pulse_sync_new_frame:sig_pulse_sync(rtl) sig_pulse_sync_frame_missed: sig_pulse_sync(rtl)sig_pulse_fin_frm_proc: sig_pulse_sync(rtl) sig_pulse_fsw_ack:sig_pulse_sync(rtl) sig_pulse_img_cmd_fs_wr: sig_pulse_sync(rtl)synchronizer_auto_lo_pwr_status: synchronizer(rtl) synchronizer_rack:synchronizer(rtl) synchronizer_rnack: synchronizer(rtl)synchronizer_img_en: synchronizer(rtl) synchronizer_auto_sleep:synchronizer(rtl)

clk_driver

The clk_driver block drives all the internal clocks used in Callisto.Clock muxing and disabling is performed in this block for the iq_clk,sq_clk and rq_clk[1:0] clocks. Clock enable signals (generated in theserial interface and image sensor circuits) are sampled on the negativeedge of their driving clock to avoid glitching duringdisabling/swapover. When the test mode signal (tmode) is asserted allgated clocks are sourced from sclk to enable successful scan and RAMBIST testing. For architectural details regarding clocking strategy seeSection. The clock generation logic is illustrated in FIG. 75.

config

The config block contains the configuration registers anddrives/receives the signals of the external register interface.

The configuration registers are stored in a single hierarchial type,indexed via the register address. The cfg signal which is output fromthis block is a flattened type, allowing for easier use. The statusregister, due to its clear-on-read nature is a special case. At thestart of a status register read operation, a snapshot of the register istaken. At the same time the register is cleared and then immediatelyupdated with any events from the current clock cycle. This sequenceensures that no events are missed during the read-clear operation. Thesnapshot value is then used as the read value.

The register_read state machine and associated counter control the readdata output. This state machine manages: message header delay;external/internal read delays; variable number of output bytes; theserial interface byte timing; and the reg_read_done output signal. Thisstate machine is illustrated in FIG. 76.

Read data bytes are output from the config block with a fixed cadence of1 valid byte every ten clocks to match the serial interface data rate.This concept is illustrated with a four byte register read operation inFIG. 76 a.

All external register interface outputs are registered before beingoutput. The (already synchronized) s_rack and s_rnak signals are used tovalidate the external register interface inputs. The detection of s_rnakasserted is interpreted as an illegal external address error.

serialif

The serialif is a structural block that performs serial interfacemessage reception and transmission. The basic structure of this block isillustrated in FIG. 77.

The serial data received is first converted into bytes by thesif_ser2par block. This byte is then delineated into messages by thesif_msgsync block. The messages are then interpreted by the sif_msghandblock. The sif_msghdrgen generates the headers for transmitted frames.The sif_par2ser block converts the byte streams from the sif_msghdrgen,config and imgproc blocks into a serial bit stream. The sif_errhandblock collects and collates all the error messages received by thevarious serial interface blocks, and controls the serial interface errorrecovery process.

sif_ser2par

The sif_ser2par block receives the serial bit stream and delineates eachbyte based on the start and stop bits. On successful delineation thebyte is output with an associated valid flag asserted for a singlecycle. If rxd is detected to be held low for 10 consecutive cycles(whilst tx_break is asserted) the rx_break_status signal is asserted.This signal is negated when rxd is asserted. If a stop-bit is not foundwhere expected, the start_(—stop)_error signal is asserted. FIG. 78illustrates the ser2par state machine used to control the serial toparallel conversion.

sif_msgsync

The sif_msgsync block performs message delineation. The message markerbyte (0x5A) is used to obtain and check delineation. The message controlbyte and subsequent argument bytes are used to determine the messagelength. The msg_sync state machine and associated byte counter is usedto control and track the delineation state. This state machine isillustrated in FIG. 79.

The output data is simply a registered version of the input data, withthe addition of a control byte flag bit. The message_sync_error outputsignal is a single-cycle pulse that is asserted when delineation islost.

sif_msghand

The sif_msghand block performs received message handling. It interpretsthe message control byte and any subsequent argument bytes. Malformedmessages are deleted and an error signal generated (used by the configblock). Valid messages are converted into command words. The msg_handstate machine and associated counters control this operation and thisstate machine is illustrated in FIG. 80.

Each register access is translated into a single command word on thereg_acc bus. In addition to the rwr, addr, extn and wdata signals thereg_acc bus has a go signal which indicates the start of a valid access.For register read accesses the reg_read_done signal is returned by theconfig block indicating that all the read data has been sent to thepar2ser block, this enables command overflow error detection. A registerwrite followed by a register read operation is illustrated in FIG. 81.

Each image command is translated into a single command word on theimg_cmd bus. The subpixel command is the only exception; this command istranslated into a series of command words, one for each sub-pixelcoordinate (x,y pair). The img_cmd bus consists of six different fields:typ, arg, fgo, go, fs_s and fs_wr. The typ field indicates the imagecommand type. The arg field is a 32-bit bus which carries all theparameter information (topleftX, etc.), this field is loaded with theconfiguration register values on reception of the message control byte,and then over-written with any message parameters. For non-subpixelimage read commands the go and fgo bits are identical and indicate thepreviously mentioned typ and arg fields of the img_cmd bus are valid andan image read can start. For subpixel image commands the fgo bit flagsthe first coordinate pair of a command and the go bit indicates thefirst and subsequent coordinate a pairs for that command. The fs_wr bit(active for a single-cycle) indicates the current data in the arg fieldpart of a direct frame store write. The fs_s bit indicates the start ofa frame store write sequence. A sequence of unprocessed, process andsubsampled image reads is illustrated in FIG. 82. A subpixel image readcommand is shown in FIG. 83. FIG. 84 illustrates a direct frame storewrite sequence.

Frame handshaking is also performed by the sif_msghand block. Thismechanism controls the generation of the send_rx_new_frm_msg signal(used by the sif_msghdrgen block), the fin_frm_proc pulse (used by thesens_ctrl block) and the clock enables for sq_clk and rq_clk[1:0]. Theframe_handshaking state machine which is illustrated in FIG. 85.

In addition the sif_msghand block also detects and flags the followingmessage related errors: malformed_msg, cmd_overflow, img_dat_underflow,fsw_nack.

sif_msghdrgen

The sif_msghdrgen block generates the transmitted message header bytesfor image read commands, register read commands, frame_sync andframe_store_write_ack messages. This is done by monitoring the commandsissued by the sif_msghand block and generating the appropriate messageheader when it detects either an image read or register read.

The sif_msghdrgen block also generates complete frame-sync andframe-store-write-ack messages based on the send_rx_new_frm_msg,send_fsw_ack_msg and send_fsw_nack_msg signals respectively. Thehdr_done signal is generated and used by within the imgproc block toindicate that the message header has been sent and image data is able tobe transmitted.

The header_generation state machine and associated counters control thegeneration of the message headers. This state machine is illustrated inFIG. 86.

For image data messages a two-byte message data byte count field iscalculated. For image commands, the number of returned image data bytesis calculated using the command arguments (parameters). This involves asize_x by sizes_y multiplication for the image pixel read commands, anda division by 3 for the subpixel read command. The number of data bytesreturned in a register read message is determined via a lookup based onaddress and whether the register is internal or external.

Message header bytes are output from this block with a fixed cadence of1 valid byte every 10 clock periods to match the serial interface datarate.

sif_par2ser

The sif_par2ser block accepts message header, register, stored image anddirect image sensor data bytes and converts them to a serial bit stream.When the tx_break input is asserted, normal operation is overridden andthe txd output held at logic zero. When tx_break is negated txd is heldhigh until the first valid byte is received, at which point normaloperation resumes. It is assumed that only one of the four data sources:message header, register read data, stored image data and direct imagesensor data is active at any one time, and that the arriving bytestreams are rate-adapted at the serial interface rate of one valid byteevery ten sclk periods. This is illustrated in FIG. 87.

The sif_par2ser samples a valid byte, and the par2ser state_machine andassociated counter is used to control generation of the txd sequence:start-bit, serial-bit stream, stop-bit, and any possible tx_breakconditions. This state machine is illustrated in FIG. 88.

sif_errhand

The sif_errhand block performs the error protocol management for theserial interface. The error_handler state machine controls the errorrecovery process. This state machine is illustrated in FIG. 89.

All serial interface errors are input to the sif_errhand block andcollated into the sif_error output signal which is then passed to theconfig block.

Several error related output signals are generated. The stop_cmd_execsignal is a pulse used by the image processing blocks to abort allcommand processing. The msg_sync_status signal indicates whether theserial interface is in message synchronization. The tx_break signalindicates that the serial interface should transmit the break sequence.

imgproc

FIG. 90 shows a structural block containing the four image processingfunctions.

Note that the block 78 is not internal to the imgproc block: it is shownhere only to indicate the connectivity to the subsambufs and framebufsblocks.

imgproc_fs

Provides the ‘Unprocessed Image Read’ function and the ‘Sub-pixel Read’function.

The ‘Unprocessed Image Read’ function scans the region provided in theimg_cmd—returning one byte for each pixel in the region.

The ‘Sub-pixel Read’ function re-uses some of the same code—it gets thefour pixels required by scaning a 2-by-2 region in the same way as‘unprocessed image read’ scans a region, except that it manipulates andaccumulates the data on the way and returns only one byte per “region”.Its state machine is shown in FIG. 91.

Unprocessed Image Read (Function)

For the Unprocessed Image Read function, the Go indication loadscounters to produce (x,y) coordinates for the region. The GetByte stateis transient and generates an address to the frame buffer. In the Storestate, the resulting pixel is stored and the WaitSer state entered. Whenser_avail goes active, a byte request, along with the byte, isimmediately output. If we are at the end of the region, we return to theIdle state. Otherwise, we update all the counters, moving to the nextrow if required, and go back to the GetByte state.

Sub-Pixel Read (Function)

For the Sub-Pixel Read function, the Go indication loads counters toproduce (x,y) coordinates for the 2×2 region with the top left of thesupplied coordinate.

The GetByte state is transient and generates an address to the framebuffer.

The Store state is also transient—storing the pixel locally for furtherprocessing in the Process state, which performs the weighting functionon each pixel as it arrives.

After the Process state, if the last pixel has been processed, theresulting sub-pixel value is stored and the WaitSer state entered. Whenser_avail goes active, the byte is sent to the serialif block and theIdle state is entered, because we only ever send out one result perregion—the Last Byte status is remembered from the Process-to-WaitSertransition.

imgproc_ss

Provides the ‘Sub-sampled Image Read’ function and the ‘Processed ImageRead’ function.

Sub-Sampled Image Read (Function)

The ‘Sub-sampled Image Read’ is highly similar to the ‘Unprocessed ImageRead’ function, except some multiplexing is required to get the singlebyte of data out of the 8-bytes returned from the sub-sample buffer.

Processed Image Read (Function)

The ‘Processed Image Read’ function is the most complicated of all thefunctions.

The required output is a stream of 1-bit pixel values for a specifiedregion. The pixel order is row-by-row, and left to right within eachrow, with each row's pixels padded out into an integer number of bytes.

FIG. 92 below shows the sub-functions of the function. Note that theSub-Sample Buffer is shown here only to show the cadence of the data.

Address Generator Sub-Function

The algorithm for producing a stream of range-expanded and thresholdedpixels in this order involves scanning across each row of the requestedregion, starting each row from 2 columns before the LHS of the regionand ending 2 columns past the RHS of the region. The two rows above andtwo below are automatically returned for each address generated, sothere is no need for these extra rows to be explicitly addressed.

Control info is passed ahead that indicates; which five bytes to usefrom the eight returned; whether to pad this bit; whether this column isvalid; whether or not the first two rows are valid; whether or not togenerate a bit for this pixel; and when to send a full byte.

Delay Match Sub-Function

Since the Sub-Sample Buffer returns data in the next cycle, the controlinfo that matches the data must be delayed by one cycle.

Data Alignment and Masking Sub-Function

Takes the 8 bytes from the Sub-Sample Buffer and selects the appropriate5 rows. Also invalidates bytes that are not wanted in the min-maxcalculation.

Column Min-Max Generator Sub-Function

At each column, the pixel data and the two bytes above and below areprocessed to give the min and max values over that 5-byte column—this isshown in FIG. 93.

Column Min-Max Pipeline and Range-Expand and Threshold Sub-Function

These min, max and pixel values are pushed together into a pipeline withthe four previous min-max-pixel values. These five pipelined values arethen min-maxed to find the min and max over the 5-by-5 region centredaround the pixel in the middle of the pipeline—this is shown in FIG. 94.

Because we can read all five bytes for a column in a single cycle, oncethe pipeline is full, we can produce one auto-level-threshold pixelvalue per cycle for every cycle after that.

Serial-To-Parallel Sub-Function

Bits are just shifted into an 8-bit shift register and the resultingbyte sent to the serialif when requested by the addressgenerator—remembering that the address generator controls the cadencefor the entire block; including the output byte stream to the serialif.

Handing Out-Of-Bounds Pixels

When parts of the 5×5 threshold region fall outside the window, theseparts need to be excluded from the min-max calculation. This is allcontrolled at the Address Generator.

a. Top Side

When the row being thresholded is either row 0 or 1, then the two byterows above the thresholded row in the return value are individuallymasked as required.

b. Bottom Side

As each row is written from the image sensor side, all the byte laneslower than the actual one being written are also written with that samevalue. This means that the last row is duplicated at least two extratimes, and these duplicated rows can be used in the min-max calculationwithout affecting the min-max result.

c. Left Side

The final decision is not made yet—one possibility is to allow negativeX values and mask the entire 5-byte result from the min-max calculationsif X<0. Another would also allow negative X values, but overwrite the Xvalue in the address calculation to zero if X<0.

d. Right Side

The the X coordinate of the current read will be checked against thewindow width and the resulting 5-bytes masked if it is outside thewindow.

Padding the Output Byte

When the width of the region is not divisible by 8, padding bits areadded at the end of the byte. The process that sweeps across the rowactually acts as if the width was divisible by 8, but supplies an extrabit into the pipeline to tell the final stage of the range-expand andthreshold function to use the configured padding bit instead.

Achieving 100% Throughput

Due to the requirement to pad the output stream to 8-bits at the end ofeach row, I will only talk here in terms of possible utilization of theoutput serial bus, and not the throughput in terms of true, useabledata.

The output serial bus will only be less than 100% utilized when theregion width is 8 pixels or less.

To achieve 100% throughput across the serial interface, the range-expandand threshold function needs to output (on average) 8 bits every 10clocks.

During the bulk of long rows, this is not a problem. Once the pipelinehas been filled, the range-expand and threshold function can output onebit per cycle. In fact, we have to slow it down to produce only eightbits every ten cycles.

On the other hand, there are two dead cycles at the start of and at theend of each row—so between rows there are four dead cycles.

Noting from before that the address generator always produces a rowbit-stream that is divisible by 8, we see how the output bitstreamprogresses for region widths of 8, 16, 24 and 40 pixels. See FIG. 95.

This figure shows the cadence of the bytes arriving at the centre of thepipeline (see FIG. 94), and the 10-bit output cadence each 8-bit block.

The 2-cycle Pre-Fill state indicates the pipeline receiving the max-minvalues of the two columns to the left of the first pixel in the region.Similarly, the 2-cycle Trail state indicates the two columns to theright of the last pixel in the row passing through the centre point asthe pipeline is flushed. Note that the Trail state is followedimmediately by a new Pre-fill state: the data for the next row followsright behind the previous row.

The 2-cycle Idle state is used periodically to stop the input data rateexceeding the output rate.

The blocks of 10-bits show how the 8-bit data just collected is outputto the serial port. Because the serialif block takes data in 8-bitchunks in a single cycle, then serializes it over 10 cycles, there is noneed for a FIFO as such, just a shift register. The address generatorensures that the shift register will never overflow.

imgproc_sertim

The imgproc_sertim block provides the serial timing for the output bytestream, independent of the serialif. It is used by the imgproc_fs andimgproc_ss blocks.

This block thus must be ‘tuned’ to the operating parameters of theserialif block. It basically provides an initial hold-off time at thestart of each ‘fgo’ (first-go) for the serialif to send the responsepre-amble, then allows one byte out every 10 cycles.

The imgproc_sertim state machine is shown in FIG. 96. Notes for thestate machine are as follows:

-   1. FirstGo—This is the ‘fgo’ field of the image command from the    serial_if. This basically says: “Wait for the serial_if to end out a    command header before you start”.-   2. When stop_img_cmd=‘1’, this acts as a global reset and overrides    other transitions.-   3. The ser_avail output is ‘1’ only during the ProcByte state. The    ByteRequest may come immediately (in the same cycle), so this staet    may only last for one cycle.-   5. The HdrWait state will last for 30 cycles. The WaitSer state will    last for 9 cycles, and when added to the minimum one ProcByte state,    we get the required 10 cycles for every byte.

framebufs

Structural block that instantiates either one or two framebuf blocks,depending on the buffering generic passed to it.

It makes sure the correct buffer is accessed by the imgsensif andimgproc blocks.

The two rq_clks are each directed to their respective buffers.

The two blocks (imgsensif and imgproc) accessing the frame buffers eachprovide two memory enable (sens_me(1:0) and user_me(1:0)) signals, onefor each buffer. The framebufs block just directs each enable signal toeach individual framebuf block, while all other inputs are simplyconnected to both blocks. For example, sens_me(1) is connected to thesens_me port of framebuf_1.

This block also multiplexes the two sens_dout output buses from eachbuffer onto the higher level sens_dout. It does likewise for user_dout.

Each block ensures that only one of its' enable signals is set at atime, and the higher layer protocol ensures that the two blocks don'tclash with each other.

At this point the fs_width generic is used to calculate the size of eachframestore buffer RAM (in bytes). This value is passed down as a newgeneric mem_size.

framebuf

Structural block that instantiates the RAM required for a single framebuffer. Provides write only access for the imgsensif block and read onlyaccess to the imgproc block.

fs_ram_bist

This block provides an fs_ram and a BIST block to test it.

RAM bypass is also provided here—the din, addr, en and we signals areconcatenated, zero extended to the next 8 bit boundary, chopped into 8bit chunks and XORed to provide a final 8-bit value. This value is muxedonto the dout port when tmode is active.

Note that two are required per subsambuf block, to provide 70-bit wideaccess.

fs_ram

This block provides a wrapper around the fs_asic_ram.

It is assumed that the fs_asic_ram is 32 bits wide, with 4 individuallywritable byte lanes.

This block converts to the 8-bit accesses of the main design to 32-bitRAM accesses, and back again. It also converts the VHDL unsigned typesof the main design with the std_logic_vector types of the fs_asic_ram.

This block may need to be recoded depending on the final RAMimplementation.

fs_asic_ram

This is the component that must be replaced with the actual silicon RAM.

It is assumed to be single-port, synchronous and 32-bits wide with fourindependently writeable byte lanes. It's size (in bytes) should be atleast fs_width**2, where fs_width is the Callisto top level generic.

subsambufs

Structural block that instantiates either one or two subsambuf blocks,depending on the buffering generic passed to it.

The two rq_clks are each directed to their respective buffers.

The two blocks (imgsensif and imgproc) accessing the subsample bufferseach provide two memory enable (sens_me(1:0) and user_me(1:0)) signals,one for each buffer. The subsambufs block just directs each enablesignal to each individual subsambuf block, while all other inputs aresimply connected to both blocks. For example, sens_me(1) is connected tothe sens_me port of subsambuf_1.

This block also multiplexes the two sens_dout output buses from eachbuffer onto the higher level sens_dout. It does likewise for user_dout.

Each block ensures that only one of its' enable signals is set at atime, and the higher layer protocol ensures that the two blocks don'tclash with each other.

subsambuf

A structural block that instantiates the RAM required for a singlesub-sample buffer. It provides read/write access for the imgsensif blockand read only access to the imgproc block.

The address manipulation and data multiplexing is provided at thislevel.

ss_ram_bist

This block provides an ss_ram and a BIST block to test it.

RAM bypass is also provided here—the din, addr, en and we signals areconcatenated, zero extended to the next 35 bit boundary, chopped into 35bit chunks and XORed to provide a final 35-bit value. This value ismuxed onto the dout port when tmode is active Note that two are requiredper subsambuf block, to provide 70-bit wide access.

ss_ram

This block provides a wrapper around the ss_asic_ram. It provides noother function than to convert the VHDL unsigned types of the maindesign with the std_logic_vector types of the ss_asic_ram.

This block may need to be recoded depending on the final RAMimplementation.

ss_asic_ram

This is the component that must be replaced with the actual silicon RAM.

It is single-port, synchronous and 35-bit wide. It's minimum size isdetermined by the Callisto top level generic fs_width, and is calculatedas follows:ss_width=int((fs_width−1)/3)+1ss_height=int((ss_width+4−1)/8)+1ss_mem_size(min)=ss_width*ss_height

-   -   where int(x) is the integer part of a real number x.

See the ss_mem_size_f( ) function in the imgsensproc VHDL package.

imgsensif

As shown in FIG. 97, imgsensif is a structural block that pushes datafrom the sensor to the frame and sub-sampled buffers.

sens_mux

Enables either the sensor interface or the serial interface to writeframe data. Always clocked—see also section Clocking.

It detects the rising edge of isync and generates a single pulse on theoutgoing isync1. In test mode, this block will also present every tenthvalue of the sensor interface to the serialif block via the test_datasignal.

sens_ctrl

Controls which buffer a frame will go into, and controls the sensor sideclocks.

If a buffer is available, sens_ctrl passes data through to the nextavailable buffer and waits for ‘EOW’ from sens_ss. ‘EOW’ marks a bufferas full and causes sens_ctrl to generate ‘new_frame’ to the serialif.‘fin_frm_proc’ from the serialif frees the oldest buffer. If no bufferis available at the start of a frame, the frame is dropped and a‘frame_missed’ pulse is generated.

Two VHDL architectures are provided in the design—the fsm architectureis a double-buffered version (FIG. 98), while the onebuf architecture isa single buffered version (FIG. 99).

sens_fs

sens_fs performs the windowing function and writes all data inside thewindow into the frame store buffer.

It also calculates sub-sample pixel sub-row values (performing pixelreplication where required) and passes theme to the sens_ss block.

These sub-sample pixel sub-row values are the sum of the three pixels inthe same row of a sub-sample pixel. Thus, over three rows of framepixels, three sub-row values are sent for each sub-sample pixel. Whenpixel replication is performed on the bottom edge, fewer than threesub-row values are sent.

Sub-sample pixel replication is performed at the right and lower edgesof the window. First, the end frame pixel is replicated to the right ifrequired—producing an intermediate sum with any unreplicated pixels inthe same row. Then, only during the last row of the window, thisintermediate sum is also multiplied by 1 plus the number of rows thatneed to be filled—either 1 or 2. This is the final sub-row value that ispassed to the sens_ss block.

sens_ss

sens_ss takes the sub-sample row value and updates the sub-samplebuffer.

The subsample buffer is capable of accumulating 11-bits per pixel for anentire row of subsample pixels at a time.

When the first sub-row value for a sub-sample pixel arrives, itoverwrites the value in the subsample buffer. When the second or thirdsub-row value arrives, it is added to the value in the sub-samplebuffer. When the last sub-row value arrives (and this may also be thefirst, second or third sub-row value depending on bottom edge pixelreplication) the result is divided by 9 before being written to thesub-sample buffer.

flash_expose

The flash_expose block generates the flash and expose image sensortiming output signals. The timing of these signals is based on eitherthe internally or externally generated capture timing signal, and theflash and expose delay and high-time configuration values. A 24-bitcounter is used to either generate or track the capture signal dependingon the state of the Captureln configuration bit. Two 16-bit counters areused to generate the flash and expose signals. These counters (one forflash, one for expose) are loaded with the delay value when the capturesignal is pulsed. They count-down and are subsequently loaded with thehigh-time value when the count is zero, at which point the timing signal(flash or expose) is asserted. When the high-time count reaches zero,the timing signal is negated and the counter remains inactive until thecapture pulse is detected.

The flash_expose block accepts the variant generic which disables thegeneration of the fval signal, which is used only on the Europa design.

synch

A structural block containing synchronizers for data transfers betweenthe sclk and iclk domains. Three types of signal synchronization areused: level, reset and pulse.

synchronizer

Synchronizes a signal using a standard n-stage synchronizer with thenumber of stages defined by the num_sync_stages_nc constant (3). Thesynchronizer design is illustrated in FIG. 100.

reset_sync

The reset_sync block synchronizes an active-low reset signal andproduces an asynchronous assert (falling edge) and synchronous negate(rising edge). The number of synchronizer stages is defined by thenum_sync_stages_nc constant (3). This synchronizer uses flipflops thatare not reset. The test mode input (tmode) enables the output resetsignal to be fully controllable during scan testing. The reset_syncdesign is illustrated in FIG. 101.

sig_pulse_sync

The sig_pulse_sync block synchronizes a pulse from one timing domain toanother. Due to scan-test restrictions, this is implemented usingflipflops (instead of latches). The operation is as follows: the risingedge of the source pulse asserts the req signal. This req signal is thensynchronized by the destination clock and the rising edge of thesynchronized req signal used to generate a pulse in the destinationclock domain. Meanwhile, the synchronized req signal is fed back to thesource domain, where is it acts as an acknowledge. It is synchronizedand used to reset the original req flipflop. The sig_pulse_sync designis illustrated in FIG. 102.

VHDL Generics

There are three independent generics used in the design.

The variant generic takes on the values v_europa or v_callisto. This isset on the instantiation of the core block, and is spread throughout thedesign where required. It is used mostly to optimise the subsamplebuffer address equation, but also in the sif_msghand block.

The buffering generic takes on the values b_single or b_double. It isalso set on the instantiation of the core and spread where needed. It isused to conditionally instantiate the second of the double buffers. Itis picked up by the config block to be reflected in the BufferingModefield of the Chip ID register.

The fs_width generic is set on the callisto entity at the very top ofthe design. It defines the width and height of the framestorebuffer—each framestore buffer must hold at least fs_width*fs_widthbytes—and it can take on values 1 to 256. This value is used tocalculate the framestore buffer RAM address from the (x,y) coordinatesand the subsample buffer RAM address as described above under theArchitectural Overview.

The framebufs and subsambufs blocks use the fs_width generic tocalculate the ss_asic_ram and fs_asic_ram memory sizes, which are passeddown as the mem_size generic. This mem_size generic is used by the BISTcircuitry to calculate the number of RAM addresses to test, and by thess_asic_ram and fs_asic_ram behavioural models—which assume that thefinal Callisto implementation actually uses the minimum required memorysizes for a given fs_width. If more memory is actually used than isdefined by fs_width, it will be used, but will not be tested by BIST.

The three generics always appear on component entities with defaultvalues of v_europa, b_double and 128 respectively. These defaults arepurposely set to the values required to synthesize Europa.

BUFFERING

The design of the entire core is such that single and double bufferingcan be handled with relative ease.

The double-buffering scheme is fundamentally controlled inside thesens_ctrl block. It controls its own writes to the buffers, and when anew buffer is received, the new_frame event it sends to the serialifcontains the number of the buffer that was written. It is this valuethat the serialif subsequently includes with all its image commands tothe imgproc block, and uses to enable the sclk onto the appropriaterq_clk.

The single buffered architecture (onebuf) of the sens_ctrl block willonly allow one buffer, and will only ever set new_frame.data.id to ‘0’.FIG. 103 shows new frame events in a double buffering environment.

Single Buffering

The basic cycle under normal operation, i.e. no missed frames, is shownin FIG. 104. FIG. 105 shows normal operation, including all commands.

Single Buffer—Normal Operation

FIG. 106 shows a frame arriving at the imgsensif before the “FinishedFrame Processing” event arrives from the processor. We see that the “NewFrame” only comes in response to isync after “Finished FrameProcessing”.

Double Buffering

FIGS. 107, 108 and 109 respectively show double buffering with:

-   Same cadence as normal operation for single buffer-   No missed frames, simultaneous read and write-   One missed frame

CLOCK CIRCUITS

There are three main aspects to the clocking of registers;

-   -   Separate input clocks for serial and sensor timing    -   Buffer access by circuits in these two clock domains    -   Low power operation

The following clocks are derived from the two input clocks:

-   -   s_clk: always active—straight from sclk    -   i_clk: always active—straight from iclk    -   sq_clk: sourced from sclk. Active from when a frame becomes        available and disabled in low power mode    -   iq_clk: sourced from iclk. Active only when the sensor is        writing to the buffers and disabled when in low power mode    -   rq_clk(0): active when buffer 0 is being accessed    -   rq_clk(1): active when buffer 1 is being accessed (double        buffered incarnation only)

Fundamental to the clocking strategy is the assumption that interactionbetween clocks within the two clocking families (‘i’ and ‘s’) does notrequire any special circuitry. Synthesis using appropriately definedinter-clockskew, followed by corresponding clocktree skew balancingduring layout, allows this to be realised.

Each of the two rq_clks drives one of the two buffers in the doublebuffering scheme.

Each rq_clk can be sourced from either s_clk or i_clk (or neither),depending on what function is accessing the buffer—the internal protocolensures only one side will access a buffer at any one time.

Each of the sclk and iclk domains controls its own drive to the rq_clks.The internal protocol for swapping clocks requires each domain tosimultaneously turn off its drive to an rq_clk and to send an indicationto the other clock domain through a synchronizer. It is the latencyprovided by the synchronizer that guarantees only one domain will bedriving an rq_clk.

IMAGE PROCESSING ARITHMETIC PRECISION

There are three places where precision is a factor:

-   -   Range-Expansion and Thresholding    -   Sub-pixel Generation    -   Image Subsampling

Range-Expansion and Thresholding

Referring to Section 3.3.5, there are no special requirements formaintaining precision in the following equation:v>=((t/255)*(max−min))+min

The t/255 value is presented as an 0.8 fixed-point binary number: it isnot actually calculated in the device.

At all stages, full precision is maintained by increasing the number ofbits where necessary.

Sub-Pixel Generation

All operations are fixed point binary. At all stages, full precision ismaintained by increasing the number of bits where necessary.

Rounding is performed by starting with a constant value of b0.1 (binary1/2) in the accumulator, and simply truncating at the end.

Image Subsampling

The sub-sampling process basically requires nine 8-bit values to besummed, then divided by 9 and rounded to produce an 8-bit result.

The precision of the design is operationally equivalent to floatingpoint precision—i.e. the result for all possible input values gives aresult that is indistinguishable from a floating point processor.

This is achieved in two ways.

-   -   The summation process only requires that the number of bits of        storage at all stages is sufficient to hold the full range of        values that could be possible at that stage. The result of this        process is a 12-bit unsigned number, which is adequate to store        all numbers from 0 to 255*9.    -   The ‘divide by 9 and round’ process is more complex.

We were able to use a Taylor expansion to get the desired result usingonly a subtractor, two adders and some shifting.

We ‘lucked in’ here because the binary value of 9 is b1001, which canalso be represented as b1000*b1.001. Thus we have:result=int(b0.1+acc/(b1000*b1.001))

The (acc/b000) term is trivial—it is just a fixed point shift, whichcosts nothing in terms of gates.

So we are left with the interesting problem:acc/b1.001

The constant b1.001 can be rewritten as (1+x) where x is b0.001

Using the Taylor expansion, we get $\begin{matrix}{{{acc}\left( {1 + x} \right)} = {{acc}^{*}\left( {1 - x + {x\quad 2} - {x\quad 3} + \ldots}\quad \right)}} \\{= {{{acc}^{*}\left( {1 - x} \right)}^{*}\left( {1 + {x\quad 2} + {x\quad 4\ldots}}\quad \right)}}\end{matrix}$or more specifically, for x=b0.001,acc/(1+b0.001)=acc*(1−b0.001)*(1+b0.000001+b0.000000000001+ . . . )

This still involves an infinite series, but the task here is to find outhow many of the increasingly smaller terms is required to give thedesired accuracy.

The solution was to use a brute force method to check the result of allpossible input values (0 to 255*9). The final function used only the(1+x2) terms; however a small constant value was added to the finalresult to approximate the x4 term over the input range. We did it thisway because we had to add a constant b0.1 at the end for roundinganyway—so we just added a slightly bigger constant.

INTEGRATED MEMORY

All RAMs are synchronous single-port with separate read and write dataports.

The general access methods are shown in FIG. 110. The update cycle isjust a read followed by write.

Frame Buffers

Each frame buffer is a simple, linearly addressed, byte-wide,single-port synchronous SRAM. By design, only one of the two addressingports will access the RAM at a time. A generic, fs_width, defining themaximum row width is used to generate the linear address from the (x,y)coordinates:Address=x+(y*fs_width)

Sub-Sample Buffers

The sub-sample buffers are designed to allow single cycle access to thepixels of 8 contiguous rows from the same column, but with the addedfeature of addressing on any 4-row boundary. This provides single cycleaccess to any pixel, and the two pixels above and the two pixels below,for the auto-level-threshold algorithm.

As shown in FIG. 111, each buffer is implemented with two 4-byte wideRAMs, some on-the-fly addressing and some re-ordering of the outputdata. Each RAM is cut into slices—each slice is the length of themaximum row width, and thus each slice contains four contiguous rowsside by side. Slices from each RAM are alternated to provide all therequired rows.

The two RAMs (RAM0 and RAM1) are addressed separately. If the address issupplied without an offset, both RAMs are given the same address. Theresulting 8-byte data word gets it's four LSBs from RAM0 and it's fourMSBs from RAM1. If the address is supplied with an offset, RAM1 gets theaddress as normal, but the RAM0 address is offset by the maximum rowlength (N)—thus retrieving data from the same column, but for the fourrows below, rather than above. The resulting 8-byte data word is formedwith it's four LSBs from RAM1 and it's four MSBs from RAM0 i.e the4-byte words are swapped inside the result.

The fs_width generic is used to calculate the maximum subsample rowwidth ss_width, which is used to generate the linear sub-sample addressfrom the logical (x,y) subsample array coordinates:Address=x+ss_width*(y/8)where the division function “/” is the standard VHDL definition of “/”.

An extra bit—the offset—is supplied with the address. It indicateswhether or not to offset the addressing of RAM0. This is calculated as:Offset=‘1’ when (y mod 8)>=4

Example 1: X = 0, Y = 0 => Address = 0, Offset = 0 RAM0_addr = 0 => dataout is Column 0, rows 0 to 3 RAM1_addr = 0 => data out is Column 0, rows4 to 7 final result is (LSB first) Column 0, rows 0 to 3, Column 0, rows4 to 7 = Column 0, rows 0 to 7 Example 2: X = N-1, Y = 4 => Address =N-1, Offset = 1 RAM0_addr = N-1 + N (the extra + N due to Offset = = 1)= 2N-1 => data out is Column N-1, rows 8 to 11 RAM1_addr = N-1 => dataout is Column N-1, rows 4 to 7 final result is (LSB first) Column N-1,rows 4 to 7, Column N-1 rows 8 to 11 = Column N-1, rows 4 to 11

A layer of logical addressing sits over the physical addressing—thelogical byte rows, which actually start at −2, are mapped to thephysical rows starting at 0. This is done so that the 8-bytes accessedby the physical sub-sample address always contains the 5 bytes requiredfor one column of the auto-levelling window centred around the pixel atthe (x,y) coordinate.

This means that the first two byte rows in RAM0 are wasted, but thishelps to simplify the design of the auto-level-threshold. Thesimplification comes from the fact that you can just use the Ycoordinate of the row being auto-level-thresholded and you always getthe two-rows above and the two-rows below.

The last two byte rows are also effectively wasted. However, they willcontain copies of the last row of the window—see Section on page 97.

Each RAM will actually be 35-bits wide rather than 32-bits wide. Theextra three bits will be used by the sensor side to provide the requiredprecision for the sub-sample accumulation, and will be ignoredotherwise.

The reason for the extra three bits is that the maximum intermediatevalue that needs to be stored is the sum of two rows of three columns ofmaximum pixels i.e. 6*255, which requires 11 bits total. These extrathree bits will be re-used by each row in the slice of four, since thestorage for the extra precision is not required once a sub-sample row iscomplete, and we only store the final 8-bit value.

SYSTEM TEST CIRCUITS

Direct Frame Store Writing

Direct frame store writing feature is intended to be a system-leveltesting feature, allowing Callisto to be tested without an image sensor.Frame data is loaded into the frame store by a series of image commands,each containing four pixels worth of data. The serial interface blocksif_msghand interprets frame store write messages and generates commandwords. When the WriteFrame configuration bit is set the sens_mux blockignores the external image sensor data and drives the internal imagedata signals with the data received from the serial interface commandwords.

To allow all possible iclk/sclk frequency relationships a high-levelflow control mechanism is used whereby the sens_mux block triggers thetransmission of the frame_store_write_ack message when the currentcommand is processed.

Image Sensor Data to Serial Interface

When the test enable input (ten) is asserted Callisto pushes datareceived on from image sensor data directly out of the serial interface.This operation is intended to assist manufacturing testing of the imagesensor on the Jupiter device. Due to the bandwidth mismatch, Callistosamples every tenth byte received from the image sensor, and if thisbyte is valid it is sent to the serial interface for serialization andtransmission on txd.

DEVICE TEST CIRCUITS

Scan

A single scan chain is to used for Callisto. Scan testing will beperformed using sclk only, and will therefore require the tmode input toforce mux sclk onto all clock nets. In addition, the asserttion of thetmode input will be used to disable any non scan testable logic. Thecontrol of the tmode and sen inputs during scan testing is illustratedin FIG. 112. Due to the multiple clock domains and the use of negativelyedge-triggered flipflops, careful attention must be paid to the scanchain ordering. Lock-up latches between different clock trees may benecessary. The SRAM cores may be put in a bypass or transparent mode toincrease coverage of signals going to and from these cores.

RAM BIST

Each of the four instantiated SRAMs has associated BIST logic. Thiscircuitry is used for ASIC manufacturing test of the RAM cores and runsa 13n MOVI RAM test pattern sequence. The BIST operation is controlledand monitored via the configuration registers. The test enable inputsignal (tmode) must be asserted during BIST testing to ensure the RAMclocks are driven by sclk.

Section F—Filtering and Subsampling

This section considers hardware implementations of low-pass filteringand subsampling (or decimation).

FIR filters are computationally intensive and in general, for real timevideo applications, require dedicated hardware which can exploitparallelism to increase throughput. To achieve linear phase, the FIRwill have symmetric coefficients and with square pixels can apply thesame filtering in X and Y dimensions which simplifies the hardware. Whenthe filter output is to be decimated, further savings can be made asonly input samples required to produce an output are taken into account.Usually, the 2D filter can be decomposed into an X filter and Y filterin cascade. For example, a 5 tap symmetric filter has 3 coefficientvalues so that 2 pre-adds can be used requiring only 3 multiplicationsper output sample. Since 2 filters in cascade are needed, 6multiplications per sample are required. The process could be pipelineddepending on the acceptable latency so up to 10 ms could be used at thecost of extra memory. At the other extreme, the filter could processdirectly data from the image array as it is read out or read it from thefieldstore at a lower speed.

Direct 80 and Transpose 82 forms of symmetric FIR filters are shown inFIG. 113. In some implementations, the transpose form 82 may have someadvantage over the direct form 80. The combinatorial paths are shortergiving a faster design, but a disadvantage it that the delays no longerform a shift register and cannot be used to store elements of theoriginal input data.

If a low-pass characteristic that is skew-symmetric is used, evencoefficients will be zero except for the central one which reduces thecomputational effort. This implies odd length filters of order (4M+3).Maximally flat filters:

-   M=0, coefficients 1 2 1-   M=1, coefficients −1 0 9 16 9 0 −1-   Coefficients are of the form:    h=n/2^(k)    where n and k are integers which makes exact implementation easy.    Only decimation by a factor of 2 is possible in one stage.

The partitioning and addressing of the fieldstore can be arranged suchthat neighbouring pixels are concurrently available, allowing 2Dfiltering on the fly without extra memory. This allows the processor toobtain the sub-sampled image pixels and store them for segmentation. Ahistogram can also be built on the fly.

The example shown in FIG. 115 partitions the memory into 4 blocks, whichis particularly simple for addressing (being a power of 2). However, dueto symmetry requirements, all coefficients must be equal so only asimple sinc response can be obtained. Furthermore, such a filter has adelay of half a pixel which is difficult to compensate for if thesegmented image is used directly to estimate the centres of tag targets.

Decimation by 2 in both X and Y directions is inferred unless a slightlymodified addressing scheme is used which allows odd and even samplesfrom adjacent blocks to be read at the same time.

Clearly more coefficients are needed and preferably should be an oddnumber so that the image is delayed by an integer number of pixels.

As shown in FIG. 116, the number of memory blocks increases as thesquare of the number of filter taps in X or Y so this approach rapidlybecomes impractical. Also, as mentioned above, the decimation factor istied to the filter order unless a more complex addressing scheme andcoefficient switching are used (which prevents constant coefficientmultipliers being used).

It is preferable to partition the framestore to provide concurrent lineaccess only and add additional pixel delays to make the X filter. Then,to allow a decimation factor which is not equal to the filter order, aslightly more complex addressing scheme is used and multiplexers addedto route the samples to the adders and multipliers allowing the use offixed coefficients.

In the example shown in FIG. 117, a 5th order FIR filter is assumed.Image lines are written sequentially to 5 memory blocks so that 5 linesmay be read concurrently. Since data cannot be shifted from one memoryblock to another, a virtual shift register is formed with multiplexors.It may be that some paths are not required depending on the filter orderand decimation factor N. Some sharing of the adders and multipliers(ROMs) is also possible depending on N.

The cost of adding a few linestores is small compared to the fieldstore.If decimation is required, the X filter benefits from the lower inputrate. If separate linestores are used with decimation, the X filter isperformed first and decimated, thus reducing the storage and speedrequirements of the linestores.

It will be apprciated that multiplier-less filters can be implementedusing shift and add functions. Canonical signed digit or other redundantbinary arithmetic scheme (−1, 0, 1) can also be used.

Section G—Tag Sensing Alagorithms

As described extensively in many of the cross-referenced documents, thepreferred Netpage system relies on knowing the identity of the page withwhich the Netpage pen nib is in contact and the absolute position of thenib on the page. Knowledge of the pen orientation relative to the pageis also required. In addition, various regions of the page may be givenspecial properties that need to be known by the pen without referringback to some external server, i.e. they must be determined directly fromthe page with which it is in contact.

This requirement is achieved by printing tags on the page. The tagsencode the data required by the system. These are the page identity, thetag location within the page and the properties of the region of thepage containing the tag. The orientation of the pen relative to the pageand the position of the pen nib with respect to the tag location can bedetermined from the location of the tag image in the pen's field of viewand from the perspective distortion of the image of the tag. The tagsare printed using infrared absorptive ink so that they will be invisibleto the naked eye.

Two sample tag designs are shown in FIGS. 119 to 122, which aredescribed in detail below. The present description assumes the tagstructure of FIGS. 119 and 120, although very little depends on theexact form of the tags. Many aspects of the tag sensing and decoding,especially the determination of the pen orientation and relativeposition, are described in detail in PCT Application PCT/AU00/00568.

The main focus of this report is on the image processing required todetermine the tag location and perspective distortion and to sense thetag data. This task is made challenging by the requirements that theimage consist of as few pixels as possible, by the effects of defocusblur and perspective distortion due to pen tilt, by motion blur, byshadows due to ambient illumination and by imperfections due to theprinting process and damage to the page. Further, this processing musttypically be performed by a battery-powered device at a rate of 100times per second or more.

The Structure of Netpage Tags

The tags considered in this report consist of two components: targetsand macrodots. The tag information is encoded in an array of macrodots.These consist of small solid circles about 130 μm in diameter. Thepresence of a macrodot indicates a bit value of 1, its absence a valueof 0. The data is encoded with a forward error correcting code. The tagsdescribed in PCT Application No. PCT/AU00/01111 use a (15,7)Reed-Solomon code in GF(16) (which is described in more detail below).The targets are solid circles just over 300 μm in diameter. The targetsdelineate the different tags on a page and provide reference points fromwhich the locations of the macrodots, which encode the individual tagdata bits, can be found.

The macrodots do not abut one another, thereby avoiding the formation ofdark regions that appear similar to the targets and there is a whiteborder around the targets of at least 150 μm. Hence, the targets arealways clearly visible. The exact numbers of targets or macrodots arenot important to the design of the algorithm, other than that thereneeds to be at least four targets to allow the determination of theperspective transform. For convenience, we will always assume there arefour targets. The dimensions are chosen to ensure the targets areclearly distinguishable.

Tag Sensing and Decoding

The algorithms proceeds through a number of stages to extract therequired information from images of the tags. Generally, there are sixsteps after image acquisition:

-   1. Create a list of target candidates;-   2. Select four candidates as the tag targets;-   3. Determine the page-to-sensor transform;-   4. Determine the tag bit pattern;-   5. Decode the tag region identity and position code and any flags;-   6. Determine the location of the pen nib and the pen orientation    from the perspective transform and the location of the tag centre.

Steps 1 and 2 can be merged, but it is simpler to keep them distinct.Steps 4 and 5 can be performed concurrently, as the data is oftenextracted a word at a time. Further there are a number of alternativeoptions for performing each of these steps. Of all these steps it issteps 1 and 2 that present the most challenges, although, in thepresence of severe shadowing, step 4 can also be difficult.

The page-to-sensor transform of step 3 is straight-forward. There arewell-known procedures for deriving the perspective transform given themapping of one quadrilateral into another (for example, see Section3.4.2, pp. 53-56, of Wolberg, G., Digital Image Warping, IEEE ComputerSociety Press, 1990). The algorithm for step 6, determining the penorientation and displacement, is fully described in PCT ApplicationPCT/AU00/00568. Hence these two steps are not described in thisdocument.

Tag Sensing and Decoding Algorithm

OVERVIEW OF THE IMAGE PROCESSING

FIG. 119 shows the tag image processing chain. The first two stepscondition the image for segmentation. The local dynamic range expansionoperation 84 corrects for the effects of varying illumination, inparticular when shadows are present. This is followed by thresholding86, in preparation for segmentation 88. Moments-based criteria are thenused to extract 90 a list of candidate targets from the segmented image.These first four steps correspond to step 1 in the preceding paragraphs.Geometric filtering 92 is used to select a set of targets. This is step2 described above. The pen-to-sensor transform is determined 94 usingthe target locations (step 3) and finally, the macrodots are sampled 96to obtain the codewords (step 4).

Tag Image Processing Chain

FINDING THE TAGS

The targets are used to delineate the different tags on a page andprovide reference points from which the locations of the macrodots,which encode the individual tag data bits, can be found. Once a suitableset of four targets delineating a single tag have been found, aperspective transform can be used to begin decoding of the tag. Theidentification of a set of targets proceeds in two stages. First, acollection of target candidates are found, and then four of these areselected to be the final set of targets. The search for the targetcandidates is performed directly on the image acquired by the pen and isthe most costly and difficult step in terms of computation and algorithmdevelopment.

Creating the List of Candidate Targets

The preferred algorithm to create the list of candidate targets consistsof a number of steps:

-   1. Local dynamic range expansion;-   2. Thresholding;-   3. Segmentation;-   4. Target filtering using moments.

Step 1 preprocesses the image for conversion into a binary image (step2), which is then segmented. The thesholding (step 2) can be carried outas the segmentation (step 3) is performed. It is more efficient,however, to incorporate it into the local dynamic range expansionoperation, as will be shown below. The list of image segments is thensearched for target-like objects. Since the targets are solid circles,the search is for perspective-distorted solid circles.

From the point of view of computation time and memory requirements,finding the candidate targets is the most expensive portion of thealgorithm. This is because in all phases of this process, the algorithmis working on the full set of pixels.

Local Dynamic Range Expansion

The local dynamic range expansion algorithm goes much of the way toremoving the effects of shadows and general variations in illuminationacross the field of view. In particular, it allows thresholding to beperformed using a fixed threshold.

For each pixel, a histogram of the pixels in a window of specifiedradius about the current pixel is constructed. Then the value which aspecified fraction of the pixels are less than, is determined. Thisbecomes the black level. Next the value which a specified fraction ofthe pixels are greater than, is also found. This becomes the whitelevel. Finally the current pixel value is mapped to a new value asfollows. If its original value is less than the black level, it ismapped to 0, the minimum pixel value. If its value is greater than thewhite level, it is mapped to 255, the maximum pixel value. Valuesbetween the black and white levels are mapped linearly into the range0-255.

Since the local dynamic range expansion operation must access all thepixels in a window around each pixel, it is the most expensive step inthe processing chain. It is controlled by three parameters: the windowradius, the black level percentile and the white level percentile. Thevalues of these parameters used to find the targets in this work are 2,2% and 2%, respectively. It is also convenient to perform thresholdingsimultaneousaly with dynamic range expansion. The threshold value forthe range-expanded image is fixed at 128.

The values of the local dynamic range expansion parameters are such asto allow considerable optimisation of the local dynamic range expansionalgorithm. In particular, a radius 2 window becomes a rectangular windowcontaining 25 pixels. 2% of 25 is 0.5, hence to determine the black andwhite levels, it suffices to determine the minimum and maximum pixels inthe window. The pixel mapping operation can be eliminated by calculatingthe local threshold for the unmapped pixel value directly using theequation((black level)+(white level))/2which approximates the exact value given by(black level)+[128 ((white level)−(black level))]/255

Given that the number of pixels in the window is much less than thenumber of bins in the histogram (there are 256), and that it issufficient to find only the maximum and minimum pixels in the window, itis more efficient to find these values directly by examining all thepixels in the local window of each pixel. The maxima and minima for thelocal window are best calculated from the maxima and minima of thecolumns making up the window. This way, as each pixel on a row isprocessed, the subresults from the previous pixel can be reused.

With these considerations in mind, the cost per pixel of the localdynamic range expansion operation is shown in the following table. Thedivide by 2 can be implemented as an arithmetic shift right. The countfor the register copies is a worst case count, on average there would be9 register copies per pixel. All these operations can be performed using16-bit integers. From the following table, the total operations countper pixel is 65. The only significant memory required is for thethresholded output image. If this is stored as a bit image, the originalimage size is required for storage, at the expense of extra processingto create the bit image. Otherwise, an amount of memory the same as theoriginal image size is required.

The Local Dynamic Range Expansion Per-Pixel Operations Count OperationCount Fetch 14 Store 1 Register copy 16 Compare 17 Increment 15 Add 1Divide (by2) 1

Segmentation

The segmentation algorithm takes as its input the binary thresholdedimage and produces a list of shapes. A shape is represented by a pointlist, a list of the coordinates of the pixels in the shape. The originalbinary image is cleared as each pixel is visited.

The segmentation algorithm proceeds by examining each pixel in the fieldof view. If the value of the pixel is below the threshold or if thepixel has already been assigned to an object, it proceeds to the nextpixel. Otherwise, it uses the object seed fill algorithm described inHeckbert, P. S., A Seed Fill Algorithm, Graphics Gems, pp. 275-277 and721-722, ed. Glassner A. S. (Academic Press, 1990) to determine theextent of the object. This algorithm visits each pixel a little morethan twice.

The principle of the seed fill algorithm is as follows. Given a pixel inthe image, the seed pixel, it finds all pixels connected to the seedpixel by progressively moving through all connected pixels in the shape.Two pixels are connected if they are horizontally or verticallyadjacent. Diagonal adjacency is not considered. A pixel is in a shape ifits value is above a nominated threshold. Visited pixels are set to zeroso that they will be ignored if encountered again. (Note, this assumesthe tag images are inverted, so that they are white on a blackbackground.)

Starting from the seed pixel, or the first pixel it encounters in a row,it scans along the row until it finds the first pixels to either sidethat are not in the object, placing pixel coordinates in the point listas it proceeds. Then, for each pixel in the row segment, it examines thetwo vertically connected pixels. If these are in the object and have notalready been visited, it first stores information on its current state,the segment details, and repeats this procedure recursively for each ofthese adjacent pixels.

The nature of this algorithm means it is particularly difficult toestimate its running time and memory requirements. The memoryrequirements can be limited by applying the target filtering to eachshape as it is segmented, thus avoiding the need to store the pointslist of more than one shape at a time. Also, there is a maximum numberof pixels that a valid target can occupy. Once this is reached, there isno need to continue storing points in the point list. Despite this, thefill procedure for each object still uses a stack with 4 bytes perentry, and this can grow to a depth of the order of half the image size,requiring roughly twice the image size in actual memory. In this extremecase, where the shape has a serpentine form occupying the entire image,each pixel is visited close to three times. As a rough estimate, theorder of 10-20 operations per pixel are required.

Target Filtering

The target filtering step searches the shape list for shapes of suitablesize and shape. A moments-based approach is used. The shape list isfirst culled of candidates that contain too many or too few pixels. Thenthe moments of each shape are calculated and if all the moments arewithin the specified ranges, the shape's position is placed in thecandidate list. The positions are determined by calculating the centroidof the binary image of the shape, i.e. only the pixel positions areused.

The moments filtering consists of rejecting any shapes whose binarymoment do not lie in certain specified ranges. (For a detaileddescription of moments, see Chapter 8 of Masters, T., Signal and ImageProcessing with Neural Networks, John Wiley and Sons, 1994) Theparameters considered are the aspect ratio, which must lie within acertain range and the (3,0), (0,3) and (1,1) moments, all of which mustbe less than suitably specified maximum values. For a perfect disc, theaspect ratio is 1 and the moments are all 0, a result of the symmetry ofthis shape. From symmetry considerations, the minimum aspect ratioshould be the reciprocal of the maximum aspect ratio. The perspectivetransform causes the moments and aspect ratios to vary from the idealvalues. The limits on the allowed pen tilt limit these variations and sodetermine the permitted ranges of these parameters.

The computational cost of this step depends on the number of pixels ineach shape and the number of shapes. For each shape it is necessary tofirst calculate the centroid, as central moments are used throughout.The operation counts for a shape are shown in Table. There are alsoeight divisions per shape. The results of six of these divisions areonly used in comparison tests, and so can be replaced by multiplicationsof the other side of the comparison. The remaining two of thesedivisions are required to calculate the centroid. These are divisions byN, the number of points in the shape, which can be replaced bymultiplications by 1/N. The restricted range of allowed pixel counts ina shape means that 1/N can be determined from a look-up table. Becausewe must calculate the central moments, i.e. relative to the centroidwhich is non-integral, these operations must be performed using fixedpoint arithmetic. A worst case is when the target candidates cover theentire image, in which case, we can consider the total number of pointsin all the targets to be a significant fraction of the total number ofpixels. However, in the cases where this occurs, it is unlikely that avalid set of targets will be found and so the search would be abandonedanyway.

The Moments-Based Target Filtering Operations Count (N is the Number ofPoints in the Target Candidate) Operation Count Add 9/N Multiply 5/N

An alternative to using moments is to use caliper measurements(discussed in more detail below). These require much less calculation,but are more sensitive to segmentation noise, as one pixel more or lessin an object can have a significant effect. Despite this, using thesemeasurements can produce results of comparable accuracy to thoseobtained using moments. However, because the target position must beknown to sub-pixel accuracy, the target centroid must still becalculated.

Selecting the Targets

Given a list of target candidates, four suitable candidates must beselected as targets. A simple approach is to select the four candidatesclosest to the centre. Better performance is achieved by enforcingvarious geometric constraints on the four targets. In principle, anyarrangement of four targets is feasible, but the restricted field ofview and the allowable tilt range constrains the distances and anglesbetween the targets. The procedure used is to:

-   1. Find the candidate closest to the centre;-   2. Find the candidate closest to a specified distance from the first    candidate;-   3. Find the candidate closest to a point the specified distance from    the first target along a line through the first target and    perpendicular to the line between the first two targets;-   4. Find the candidate closest to the point completing the    parallelogram formed by the first three points.

At each of steps 2 to 4, the distance of the selected target from thepreviously selected targets must be within certain limits. If this isnot the case, then a fallback procedure is used, in which the previouslyselected candidates are rejected and the next best candidate selected.This continues until an acceptable set of four targets has been found orthe list of possible target combinations is exhausted, in which case thetag sensing fails.

The main calculations performed in the above procedure are distancecalculations. To deal with the fallback, the distances should be savedas the list of candidate targets is searched. In most cases, no fallbackoccurs and so the operation count is as shown in the following table.The most expensive operation is the distance calculation, which requires2 subtractions, 2 multiplications and an addition. It is sufficient toperform the calculation using the target pixel locations, which areintegers, rather than the centroid locations, which are reals, and sothe calculation can be performed using integer arithmetic.

The Target Selection Operations Count (N is the Number of TargetCandidates. It is Assumed no Fallback Occurs) Operation Count Store 8NCompare 7N Add 12N Multiply 8N

SAMPLING THE DATA BITS

To determine the bit values in the tag image, the intensity value at thepredicted position of a macrodot is compared with the values at its fourdiagonal interstitial points. The central value is ranked against theinterstitial values and the corresponding data bit assigned a value of 1if the rank of the pixel value is large enough. Experiments indicatethat a suitable minimum rank is one, i.e. if the macrodot pixel value isgreater than any of the interstitial pixel values, the bit is set toone.

The predicted macrodot location is determined using the perspectivetransform determined from the target positions. This position isspecified to sub-pixel accuracy and the corresponding intensity value isdetermined using bilinear interpolation.

The square tag design described in PCT Patent Application PCT/AU00/01111and illustrated in FIGS. 120 and 121 has 240 macrodots and 304interstitial positions. Thus, 544 perspective transforms and bilinearinterpolations are required. The following table shows the operationcounts for this process. All these operations are fixed pointoperations. Given the number of intensity values that must be sampledand their compactness in the image domain, it may be worthwhile totransform the image values into the tag coordinate domain using theapproaches described in Section 7.6, pp. 240-260, of Wolberg, G.,Digital Image Warping, IEEE Computer Society Press, 1990.

The Data Bit Sampling Operations Count (N is the Required Number ofIntensity Samples) Operation Count Fetch 4N Add 14N Multiply 11NReciprocal N

DECODING THE TAG DATA

In the square tag design described in PCT application PCT/AU00/01111 andillustrated in FIGS. 120 and 121, the tag data is encoded using a (15,7)Reed-Solomon code in GF(16). There are four codewords, each containingfifteen 4-bit symbols 92 that are distributed across the tag area. InFIG. 120, one of the four codewords is indicated by bold outlines 94around each of its symbols. The decoding procedure uses Euclid'salgorithm, as described in Section 9.2.3, pp. 224-227, of Wicker, B. W.,Error Control Systems for Digital Communication and Storage, PrenticeHall, 1995. This is unlikely to require much in the way of computationor memory to implement. A slightly more efficient algorithm, theBerlekamp-Massey algorithm (Section 9.2.2, pp. 217-224, of Wicker, B.W., ibid), can also be used.

DETERMINING THE PEN POSITION AND ORIENTATION

Given the perspective transform, as determined from the target positionsin the image, together with the geometry of the pen, one can determinethe pen position and orientation using the direct procedure described inPCT Application PCT/AU00/00568, or the iterative least-squares proceduredescribed in US Patent Application filed 4 Dec. 2002 with U.S. patentapplication Ser. No. 10/309,358.

PERFORMANCE AND RUNNING TIME OF THE ALGORITHM

From the point of view of computation and memory, the most expensiveprocessing steps are the local dynamic range expansion preprocessing andthe subsequent segmentation, as these two steps are applied to thefull-resolution image. The memory requirements for these two steps areroughly three times the size of the image in pixels, assuming that therange-expanded image is thresholded as it is formed, and so requires ⅛the amount of memory as the input image. If the thresholded image isstored in unpacked form, i.e. one byte per binary pixel, then a total offour times the image size will be required. This factor includes thestorage of the original image in memory which must be preserved for thelatter macrodot sampling. The local dynamic range expansion steprequires of the order 65 operations per pixel.

Considering a circular image field of diameter 128 pixels (correspondingto 12 900 pixels), adequate for decoding the macrodots, acquired at 100frames per second, and a processor with a clock frequency of 70 MHz suchas the ARM7, then there are 55 clock cycles per pixel. This isinsufficient for performing the initial dynamic range expansion step,let alone the segmentation. 40 000 bytes of memory are required for thetwo initial steps, which becomes 52 000 bytes if the thresholded imageis stored in unpacked form. Clearly, the only way the algorithm can beused as described is to use a faster processor or alternatively, toprovide hardware support for the local dynamic range expansion step. Theexpensive local dynamic range expansion step is used to allow sometolerance of shadowing and general variations in illumination within thecaptured image. Even using local dynamic range expansion, shadows maystill be a problem, depending on the relative intensities of controlledlight source illumination and uncontrolled ambient illumination.Generally errors occur where a shadow boundary intersects a target.

After local dynamic range expansion, the segmentation operation stillremains. This requires from 10-20 operations per pixel. Since a largeproportion of the algorithm involves memory access, this translates to20-40 processor cycles with our example ARM7 processor. In the worstcase, the moments calculation requires roughly 13 operations per pixel,requiring 25 processor cycles. Hence, using these rough estimates, thesetwo operations alone consume all of the 55 available processor cycles,leaving nothing for the remaining steps or for other processor tasks.

SUMMARY AND CONCLUSION

In this section the problem of sensing and decoding Netpage tags in thepresence of shadowing has been examined. A relatively simple approach todealing with shadows in the image has been described and analysed. It isclear that the processing resources required for even this simpleapproach probably require special-purpose hardware support.

If the controlled pen illumination is sufficiently intense compared withuncontrolled ambient illumination, then shadows are less of a problem,and a simple global threshold may be used, remembering that the mainpurpose of the dynamic range expansion step is to determine a thresholdfor the subsequent segmentation step. The required global threshold canbe determined by constructing a cumulative histogram of the image asdescribed below. Experiments show that in the absence of shadows, suchan algorithm gives a tag sensing error rate close to zero. If required,hardware support for this would be relatively simple to provide,involving little more than memory access and incrementing. Even withouthardware support, this operation would require only 6 operations perpixel to construct the initial histogram. For the ARM7 this translatesto 10 cycles per pixel.

Even with this increased illumination, it is still difficult to performthe required processing in the available time, motivating a modifiedapproach. The problem is that the early processing operations all have arunning time of the order of the number of pixels in the image. For theexample above, there are 12 900 pixels. The number of pixels required isdetermined by the need to be able to resolve the macrodots which carrythe data. The tag targets are roughly twice the size of the macrodotspacing, and can still be resolved with half the pixel spacing. Hence animage of 3 200 pixels should be adequate for finding the targets.Techniques for finding the targets using low-resolution images arediscussed in the following section.

Finding the Targets Using Low-Resolution Images

In this approach, a lower resolution images is used to determine theregions of most interest in an image, which are then examined at higherresolution. While we should be able to find the targets using ahalf-resolution image, to determine the tag macrodot bit values we needthe target positions to sub-pixel accuracy at the full image resolution.As a result, the modified search procedure consists of first findingtarget candidates using a low-resolution image and then using thefull-resolution image to make the final target selection and todetermine their positions to the desired precision.

With this in mind, this section describes algorithms for finding thetargets using half-resolution and third-resolution images. The processof finding the targets is largely identical to that described above andso we only examine the steps in the algorithm which differ. The mainchallenge it to determine the target positions accurately from thehigh-resolution images, using the results of the low-resolution steps,in a manner which does not squander the savings gained from using alow-resolution image in the first place.

Unlike the algorithm described above, the algorithms described here arenot designed for images with strong shadows. In practice, this means weare assuming the controlled illumination is sufficient to swamp theambient illumination, and hence suppress shadows due to ambientillumination.

DOWN-SAMPLING

In general, down-sampling involves forming a weighted sum of thehigh-resolution pixels in some window about the location of thedown-sampled pixel, corresponding to lowpass filtering followed byre-sampling. Since the aim of down-sampling is to reduce thecomputational burden, we should use the simplest scheme possible. Thisis to down-sample by an integral factor, which only requires averagingthe pixels in a square window of a suitable size.

This scheme can easily be implemented in hardware. By suitableorganisation of the frame buffer, the low-resolution image can be storedin a virtual frame buffer where the pixel values are accessed asnotional memory locations within a few processor clock cycles. The pixelvalues are calculated as required.

Table shows the operations count for down-sampling as a function of thenumber of pixels in the full-resolution image and of the down-samplingfactor. Assuming an ARM7 processor, this comes out as 5N+5N/k² cyclesoverall, where N is the number of pixels in the image and k is thedown-sampling factor.

The Down Sampling Operations Count Per Down-Sampled Pixel (N is theNumber of Pixels in the Full-Resolution Image and k is the Down-SamplingFactor) Operation Count Fetch N Store N/k² Add 2IN + N/k² Compare N/k²Multiply N/k²

FINDING THE TARGETS

Introduction

The approach to finding the targets at low-resolution is essentially thesame as that used previously with two changes. First global dynamicrange expansion is tried, rather than local dynamic range expansion, aswe are relying on. artificial illumination sufficient to substantiallyeliminate shadows. Second, caliper measurements are used to filter thetargets, rather than the moments-based filtering described above.

Global Dynamic Range Expansion

The global dynamic range expansion process is similar to the localdynamic range expansion process described above. The difference is thata histogram of the entire area of interest is taken and it is from thishistogram that the transfer function is determined. This single transferfunction is then used for the entire area of interest.

As with local dynamic range expansion, since we are only interested inthe thresholded image, we can use the inverse transfer function todetermine a threshold level. This single threshold level is then appliedto the entire area of interest.

As there are generally far more pixels in the area of interest than inthe 5 by 5 window used for local dynamic range expansion as describedabove, the entire histogram must normally be constructed. Thecomputational cost of global dynamic range expansion is quite low, aseach pixel is only visited twice: once to construct the histogram and asecond time to apply the threshold. The following table summarises theoperations count for global dynamic range expansion.

The Global Dynamic Range Expansion Operations Count. N is the Number ofPixels. Operation Count Fetch 2N Store N Increment 2N Compare N Add N

This adds up to roughly 12 cycles per pixel on the ARM7 processor.

Caliper Based Target Filtering

At the resolutions considered here, i.e. roughly the macrodot spacing, atarget is only two to three pixels in diameter, depending on the pentilt and its position in the field of view. The segmented images of atarget can vary by the addition or deletion of a single pixel, and atlower resolutions this can make it difficult to set useful limits forthe moments. For example, at these resolutions, a segmented target canconsist of three pixels in an L-shaped configuration. To deal with thisproblem, rather than use moments, we use caliper measurements for thetarget filtering.

Caliper filtering consists of examining the maximum extent of the shapein various directions. The parameters of the shape that are consideredare its width, its height and its area, i.e. the number of pixels itcontains. The tests are:

-   1. that the number of pixels in the shape is in a specified range;-   2. that the width and height are in a specified range;-   3. that the width to the height ratio is within a specified range;-   4. that the fill factor is large enough.

As for moments-based filtering, we first test for the number of pixelsin the shape. The tests for the width to height ratios are(width−1)≦(maximum aspect ratio)×(height+1)and(height−1)≦(maximum aspect ratio)×(height+1)

The additions and subtractions of 1 are to compensate for the spuriousinclusion or exclusion of pixels into or out of the shape. For the fillfactor the test isArea≧(minimum fill factor)×(width−1)×(height−1)where again, we have subtracted 1 from the width and height to avoid theeffects of the spurious inclusion of pixels into the shape.

The following table gives the operation count for finding the height andwidth of a condidate target.

The Operations Count to Find the Height and Width of a Candidate Target(N is the Number of Points in the Object) Operation Count Fetch 2NRegister Copy N Compare 3N Add 3N

For the ARM 7, this works out as 13 cycles per point in the segmentedobject. There may be up to 15 points per object in a half-resolutionimage.

The following table shows the operations count for calculation of thecalipers features.

The Operations Count to Calculate the Caliper Features Operation CountCompare 3 Add 4 Multiply 4

DETERMINING THE TARGET POSITIONS

To determine the precise centre of the targets we calculate thegrey-scale centroid in the high resolution image, as opposed to thebinary centroid used above. The centroid is calculated in a circularwindow about the target position determined from the low-resolutionimage.

The size of the circular window is chosen so as to guarantee includingthe entire target while excluding any nearby macrodots. This is a minorweakness of this technique. The combination of the low resolution andthe noisiness of the low-resolution segmented image means that thetarget position, as determined from the low-resolution image, can bequite inaccurate. If the window is to be large enough to encompass theentire target, taking into account any inaccuracy in the positioning ofits centre, then it will inevitably include some of the surroundingmacrodots.

IMPROVED TARGET LOCATION

A simple approach to improving the estimates of the target locations isto use the same algorithm as used for high-resolution images, exceptthat it is applied only in a small window around the target positions inthe full-resolution image. The window positions are determined from thelow-resolution images.

The histogram of a small circular region around a candidate target istaken and used to set a threshold, as described above, i.e. we useglobal dynamic range expansion within the window. An additional form oftarget filtering is then applied before the segmentation. Rememberingthat the targets are black, if the intensity of the pixel at the centreof the window is higher than the threshold for the window, the candidateis rejected and segmentation is not performed. Otherwise, the imagewithin the window is segmented.

This segmentation starts at the centre of the window. Unlike the generalsegmentation applied to the entire image, it is sufficient to extractthe single shape at the centre of the window. The position of the targetis then given by the binary centroid of the extracted shape.

As pointed out in above, most of the errors of the simple low-resolutionalgorithm are due to poor location of the targets. However, asignificant number of errors is due to target misidentification. Toameliorate this, the segmented high-resolution shape is subjected tofurther filtering using moments. Only targets that pass the momentscriteria are considered for the final target selection process which, asbefore, is based on geometric constraints.

PERFORMANCE OF THE IMPROVED LOW-RESOLUTION ALGORITHM

Similar performance is obtained using third-resolution images with{fraction (1/9)} the number of pixels. Quarter-resolution images are notso successful, since at this resolution the targets are reduced tosingle pixels. Improved performance at quarter resolution might beobtained by higher-quality filtering before down-sampling. However, thisfiltering would have to be performed in hardware for this approach to bepractical, as the filter templates are likely to be of the order of 8 by8 pixels in size. Even taking into account the gains due todown-sampling, this would require excessive processing resources from ageneral-purpose processor such as the ARM7.

Examining the numbers of candidate targets that pass each of thefiltering steps provides some interesting insights. First, atlow-resolution, the calipers tests play no part in reducing the numberof target candidates. Any reduction in the number of candidates is dueto selecting only candidates with suitable sizes. By size, we mean thenumber of pixels covered by the candidate. By contrast, many targetcandidates are eliminated because the intensity of their centre pixel inthe full-resolution image is too great (remembering that the targets areblack).

APPLYING LOCAL DYNAMIC RANGE EXPANSION TO THE LOW-RESOLUTION IMAGE

The algorithm described so far can be further improved. Pen-controlledillumination is still typically subject to variation within the field ofview due to such factors as pen tilt. To overcome the effects ofnon-uniform illumination, local dynamic range expansion is applied tothe low-resolution images rather than the global dynamic range expansiondescribed above. The local dynamic range expansion is exactly asdescribed above. The same parameters are used, noting that the dynamicrange expansion radius is in terms of the low-resolution pixels. Thecost of local dynamic range expansion is acceptable here because of thegreatly reduced number of pixels in the low-resolution image.

1. A monolithic integrated circuit for use in a system having a hostprocessor, the integrated circuit including: at least one input pin forreceiving command data from the host processor; at least one output pinfor transmitting processed image data to the host processor in responseto the command data; and an image processor configured to generate theprocessed image data by performing an image-processing function on imagedata captured by an image sensor.
 2. A monolithic integrated circuitaccording to claim 1, wherein the image processor is configured togenerate the processed image data by selecting a subregion of the imagedata.
 3. A monolithic integrated circuit according to claim 1, whereinthe image processor is configured to generate the processed image databy low-pass filtering the image data.
 4. A monolithic integrated circuitaccording to claim 1, wherein the image processor is configured togenerate the processed image data by sub-sampling the image data.
 5. Amonolithic integrated circuit according to claim 1, wherein the imageprocessor is configured to generate the processed image data bythresholding the image data.
 6. A monolithic integrated circuitaccording to claim 1, wherein the image processor is configured togenerate the processed image data by range-expanding the image data. 7.A monolithic integrated circuit according to claim 1, wherein the imageprocessor is configured to generate the processed image data by low-passfiltering and sub-sampling the image data.
 8. A monolithic integratedcircuit according to claim 1, wherein the image processor is configuredto generate the processed image data by low-pass filtering, sub-samplingand range expanding the image data.
 9. A monolithic integrated circuitaccording to claim 1, wherein the image processor is configured togenerate the processed image data by low-pass filtering, sub-samplingand thresholding the image data.
 10. A monolithic integrated circuitaccording to claim 1, wherein the image processor is configured togenerate the processed image data by low-pass filtering, sub-sampling,range expanding and thresholding the image data.
 11. A monolithicintegrated circuit according to any one of the preceding claims, whereinthe command data identifies the at least one function.
 12. A monolithicimage converter according to claim 1, wherein the command data furtheridentifies at least one parameter of the image-processing function. 13.A monolithic image converter according to claim 11, wherein the commanddata further identifies at least one parameter of the image-processingfunction.
 14. A monolithic image converter according to claim 1, furtherincluding the image sensor.
 15. A monolithic image converter accordingto claim 1 or 14, further including a first framestore for storing theimage data.
 16. A monolithic image converter according to claim 1, 14 or15, further including a second framestore for storing the processedimage data.
 17. A monolithic integrated circuit according to claim 1,including: an image sensor for capturing image information; at least oneanalog to digital converter for converting analog signals correspondingto the image information into digital image data; and a first framestorefor storing frames of the digital image data.
 18. A monolithicintegrated circuit according to claim 1, including: an image sensor forcapturing image information; at least one analog to digital converterfor converting analog signals corresponding to the image informationinto digital image data; and an image processor, the image processorincluding a low-pass filter for filtering the image data, thereby togenerate filtered image data.
 19. A monolithic integrated circuitaccording to claim 1, comprising: an image sensor for sensing imageinformation; at least one analog to digital converter for convertinganalog signals corresponding to the image information into digital imagedata; and an image processor, the image processor including a rangeexpansion circuit for range expanding the digital image data.
 20. Amonolithic integrated circuit according to claim 1, including an imagesensor having a plurality of photodetecting circuits, each of thephotodetecting circuits comprising: a photodetector for generating asignal in response to incident light; a storage node having first andsecond node terminals, the first node terminal being connected to thephotodetector to receive the signal such that charge stored in the nodechanges during an integration period of the photodetecting circuit; andan output circuit for generating an output signal during a read periodof the photodetecting circuit, the output signal being at leastpartially based on a voltage at the first terminal; the photodetectingcircuit being configured to: receive a reset signal; integrate charge inthe storage node during an integration period following receipt of thereset signal; and receive a compensation signal at the second terminalof the storage node at least during the read period, the compensationsignal increasing the voltage at the first terminal whilst the outputcircuit generates the output signal.
 21. A monolithic integrated circuitaccording to claim 1, including: (a) an image sensor for sensing imagedata; (b) timing circuitry for generating: at least one internal timingsignal, the image sensor being responsive to at least one of theinternal timing signals to at least commence sensing of the image data;and at least one external timing signal; (c) at least one external pinfor supplying the at least one external timing signal to at least oneperipheral device.
 22. A monolithic integrated circuit according toclaim 1, wherein image processor is configured to make each of a seriesof frames of image data available to a host processor, the imageprocessor being configured to: receive a first message from the hostprocessor indicative of the host processor not requiring further accessto the image data prior to a subsequent frame synchronisation signal; inresponse to the first message, causing at least part of the integratedcircuit to enter a low power mode; and in response to a framesynchronisation signal, cause the part of the integrated circuit in thelow power mode to exit the low power mode.
 23. A monolithic integratedcircuit according to claim 1, the image processor being configured to:receive, from the host processor, a request for access to a nextavailable frame of image data from a framestore; in the event the frameof image data is available, sending a message to the host processorindicative of the image data's availability; and in the event the frameof image data is not available, waiting until it is available and thensending a message to the host processor indicative of the image data'savailability.